If you are considering using a "distance to" variable in a housing demand regression, below the fold you can find the cliff notes from the paper on why direct interpretation of these variables is misleading.
Here is a very common applied research scenerio: You have data on a bunch of housing sales in a neighborhood or metropolitan area. You think there is some significant landmark that may be relevant, perhaps downtown or some toxic waste dump along the periphery. To control for this, you calculate every observation's distance from said site and include it in your regression as "Distance to CBD" or "Distance to Toxic Waste Dump." The expectation is often that the coefficient will tell you how these things will influence housing prices. As our paper demonstrates, this is not a correct interpretation.
Q1: Why is it not correct?
As long as distance to something matters, then distance to virtually anything will also matter in a regression. One of the easiest ways to think of this is with the Toxic Waste Dump on the periphery example. Suppose on the periphery directly opposite to the waste dump is a park. Is the coefficient picking up your proximity to the park, or distance away from the waste dump? Unless you are very familiar with the area, there are probably many of these types of landmarks that you are not even aware of as a researcher.
Q2: What does the distance variable tell us, if not the value of a landmark?
Basically, they give you an indication of the optimal area in terms of location.
Ultimately, you can think of a distance variable as a line moving through space. Any other line you happen to draw through space will have an angular relationship with it that will (unless it is orthogonal) influence the coefficient on your distance variable. If there are multiple competing locations, these lines become collinear if they are all included, but if you only use a subset the coefficient becomes a weighted average of the most dominating influence(s).
Q3: Ok, if distance to something matters, can we just treat it as a proxy variable, or otherwise use it to obtain unbiased estimates of the other coefficients?
Yes, but once you realize this is what you are functionally doing, you are much better off by formally modeling it rather than thinking of it as a proxy variable. Include latitude, longitude, and their quadratic counterparts. The resulting coefficients will allow you to solve for the max/min coordinates.
Q4: Are all "distance to" variables like this?
No, some distance variables are actually "distance to nearest." This problem emerges when all observations have a common point. Suppose there are multiple pollution emitters, and the variable is measuring the distance to the nearest one. This is ok, because the variable is not capturing each observation's relative position in space.
Q5: Can I just fix this problem with a spatial autoregressive (SAR) or spatial error model (SEM)?
No, the standard SAR and SEM are designed to capture a interdependent relationship (like a spatial multiplier), not a directional effect or an optimal position in space.
Q6: I would really like to know the welfare significance of some local amenity, is there a work-around to get at this without the "distance to" variable?
We do intend that to be the next stage in our research and we have some specific ideas, but nothing we're willing to market at this point (this point being Oct 27, 2009).