Bayesian Model Averaging (BMA) with uncertain Spatial Effects
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This file illustrates the computer code to use spatial filtering in the context of Bayesian Model Averaging (BMA). For more details and in case you use the code please cite Crespo Cuaresma and Feldkircher (2010).
In addition, this tutorial exists as well in PDF form: web_tutorial_spatfilt.pdf
Installing spatBMS
To get the code started you need to install the R-packages spdep
and BMS
(Version ≥ 0.2.5). Then install the add-on package spatBMS
by downloading one of the following binaries according to your system:
- Mac OS / Linux: spatBMS_0.0.tar.gz
- Windows with R Version 2.10 or greater: spatBMS.zip
- Windows with R Version 2.5 to 2.9: spatBMS.zip
For further information on installing spatBMS manually, consult the corresponding page on installing BMS.
The following text has been tested with R.2.11
.
A Primer to Spatial Filtering and BMA
Consider a cross-sectional regression of the following form:
where y is an N-dimensional column vector of the dependent variable, α is the intercept term, ιN is an N-dimensional column vector of ones, Xk = (x1,…,xk) is a matrix whose columns are stacked data for k explanatory variables and χk = (χ1,…,χk)’ is the k-dimensional parameter vector corresponding to the variables in Xk. The spatial autocorrelation structure is specified via a spatial weight matrix W. The coefficient ρ attached to W reflects the degree of spatial autocorrelation. Equation (1) constitutes a parametric spatial model where the spatial parameter ρ is often interpreted as a spillover parameter.
In this setting, on top of the uncertainty regarding the choice of explanatory an extra degree of uncertainty arises: we do not know the actual nature of the spatial interactions which we model through the spatial autoregressive term in equation (1), that is, if we conduct inference conditional on W. Spatial autocorrelation will be observable whenever the phenomenon under study is a spatial process or omitted variables cause spatial variation in the residuals (Tiefelsdorf and Griffith, 2007). Note that both arguments typically apply to economic cross-section data, where economic units interact with each other and omitted variables decrease the level of confidence in econometric analysis. Since inference from the SAR model is conditional on the weight matrix W, which has to be exogenously specified, explicitly accounting for this source of model uncertainty is a natural generalization to uncertainty in the nature of Xk in the framework of BMA. In most applications there is little theoretical guidance on which structure to put on the weight matrix rendering its specification a serious challenge.
Spatial Filtering
The spatial filtering literature seeks to remove residual spatial autocorrelation patterns prior to estimation and is in principle not interested in directly estimating ρ in equation (1). The approach put forward by Getis and Griffith (2002) and Tiefelsdorf and Griffith (2007), is based on an eigenvector decomposition of a transformed W matrix, where the transformation depends on the underlying spatial model. The eigenvectors ei are included as additional explanatory variables and the regression equation (1) becomes:
where each eigenvector ei spans one of the spatial dimensions. By introducing the eigenvectors into the regression, we explicitly take care of (remaining) spatial patterns in the residuals. Furthermore spatial commonalities among the covariates in Xk are conditioned out. This reduces the degree of multicollinearity and further separates spatial effects from the ‘intrinsic’ impact the employed regressors exert on the dependent variable.
BMA with uncertain spatial effects
From a Bayesian perspective, the problem of obtaining estimates of the parameter associated with a covariate under uncertainty in both the nature of W and Xk can be handled in a straightforward manner using spatial filtering techniques. Let us assume that we are interested in a particular regression coefficient, β. Denote the set of potential models by M={M11, M12,…, M12^K, … M21, …, M22^K,…, MZ1,…, MZ2^K }, where K stands for the number of potential explanatory variables and Z the number of candidate spatial weighting matrices Wz, z=1,…, Z each with associated set of eigenvectors Ez. The cardinality of M is therefore 2K*Z. A particular model, say Mzk, is characterized by its parameter vector θzk=(α, χk, ηz) corresponding to the intercept term included in all models, the coefficients on the regressors entering the model and the coefficients on the set of eigenvectors Ez related to Wz. In the BMA framework, the posterior distribution of β takes now the form of
with y denoting the data and β the coefficient of interest. Inference on β is based on single inferences under models j=1,…,2(Z*K) weighted by their respective posterior model probabilities, p(Mzj| y), which in turn depend on the corresponding matrix of spatial weights. For more technical details please see Crespo Cuaresma and Feldkircher (2010).
An Illustration: The Boston Housing Data
In R
we will do spatial filtering with the ‘Boston housing data’ which has been originally published in Harrison and Rubinfeld (1978). The dependent variable (CMEDV
) contains the
corrected median value of owner-occupied homes in USD 1000’s for 506 observations. Among the explanatory variables we have per capita crime (CRIM
), the pupil-teacher ratio by town
(PTRATIO
), the proportion of owner-occupied units built prior to 1940 (AGE
), the proportion of non-retail business acres per town (INDUS
), a variable that is proportional to
the share of Afro Americans per town (B
) and a variable reflecting the nitric oxides concentration (NOX
) among others. For more details please see ?dataBoston
. We start
with loading the data in R
:
library(spatBMS) data(dataBoston) data(boston.soi)
As in the earlier analyses of these data, we take logarithms of the variables CMEDV
, DIS
, RAD
and LSTAT
and squares of the regressors RM
(RM#RM
)
and NOX
(NOX#NOX
) to model potential non-linearities. These transformations have been already carried out and the transformed variables stored in dataBoston
We proceed with the construction of several weight matrices. To keep the example simple, we limit the space of candidate W matrices to five:
- The matrix boston.soi already comes along with the boston data set (W0).
- A perturbation of the boston.soi matrix (W1).
- A second perturbation of the boston.soi matrix (W2).
- A 4 nearest neighbor matrix (KNN4).
- A 6 nearest neighbor matrix (KNN6).
The first matrix is included in the spdep
package and constitutes probably a first order contiguity matrix. This class of matrices assigns positive (identical) weights to observations
that share a common border. See Anselin (1988) and Crespo Cuaresma and Feldkircher (2010) for an interpretation of different coding schemes.
For the sake of illustration we will randomly perturb the matrix using the function jitterW_binary
which is provided in the corresponding Sweave
file to this pdf.
The perturbation randomly adds / drops N=5
neighborhood relationships of the boston.soi
weight matrix.
The function takes as argument a matrix
object, thus we have to transform boston.soi
with the function nb2mat
into a matrix.
W0 <- boston.soi bostMat = nb2mat(boston.soi, style = "B") W1 = jitterW_binary(bostMat, N = 5) W1 = mat2listw(W1$Wmat) W1 = W1$neighbours W2 = jitterW_binary(bostMat, N = 5) W2 = mat2listw(W2$Wmat) W2 = W2$neighbours
It is necessary to re-transform the perturbed matrices into the nb
class since the SpatialFiltering
function we will use later on
only allows for objects of this class as inputs. This can be done with
the function mat2listw
and extracting the neighborhood object by typing object$neighbours
(in the example W2$neighbours
).
Finally we set up two k-nearest neighbor matrices.
coords = coordinates(boston.utm) col.knn4 = knearneigh(coords, k = 4) col.knn4 = knn2nb(col.knn4) coords = coordinates(boston.utm) col.knn6 = knearneigh(coords, k = 6) col.knn6 = knn2nb(col.knn6)
As stated above we assume the model follows a SAR process. To free the residuals from spatial correlation we now filter the dependent variable. The SpatialFiltering
function provided in the package spdep
will extract the eigenvectors. A linear combination of these eigenvectors will allow us to separate spatial correlation from the dependent variable by using the identified eigenvectors as additional regressors in our econometric model.
Depending on the size of your W matrix, spatial filtering can take some while. The function takes the neighborhood objects we have defined above and the data to be filtered as main arguments.
For more details see ?SpatialFiltering
. Note also that we have set ExactEV
to FALSE
(quicker) which provides an approximation
for our illustration example.
y = as.data.frame(dataM[, 1, drop = F]) yFilt.colGal0 = SpatialFiltering(dataM[, 1] ~ 1, ~-1, data = y, nb = W0, style = "W", ExactEV = FALSE) yFilt.colGal1 = SpatialFiltering(dataM[, 1] ~ 1, ~-1, data = y, nb = W1, style = "W", ExactEV = FALSE) yFilt.colGal2 = SpatialFiltering(dataM[, 1] ~ 1, ~-1, data = y, nb = W1, style = "W", ExactEV = FALSE) yFilt.knn4 = SpatialFiltering(dataM[, 1] ~ 1, ~-1, data = y, nb = col.knn4, style = "W", ExactEV = FALSE) yFilt.knn6 = SpatialFiltering(dataM[, 1] ~ 1, ~-1, data = y, nb = col.knn6, style = "W", ExactEV = FALSE)
Finally we collect the eigenvectors in a list.
WL.boston = list(Col0 = fitted(yFilt.colGal0), Col1 = fitted(yFilt.colGal1), Col2 = fitted(yFilt.colGal1), knn4 = fitted(yFilt.knn4), knn6 = fitted(yFilt.knn6))
Note that you can also access the extracted eigenvectors by typing e.g. yFilt.colGal0$dataset
instead of
e.g. fitted(yFilt.colGal0)
.
Applying BMA
Now we can start with the BMA part. All functions and input arguments of the BMS
library are applicable.
There are two important exceptions though: First, the empirical Bayes estimation as well as the hyper-g priors are so far not implemented. Thus you can specify the g-prior either by using the benchmark priors (Fernandez et al. 2001) or by any numerical number g>0. See Feldkircher and Zeugner (2009) for more details on the influence of g on posterior results. Secondly, the enumeration algorithm (mcmc=enum
) is in its current version not available for spatFilt.bms
. Finally, to speed up
calculations BMS
provides the option force.full.ols=FALSE
, which is not available in spatFilt.bms
.
The additional argument WList
of the function spatFilt.bms
must be a list object with length corresponding to the number of weight matrices you use
length(WList)
must be greater than 1, that is you have to submit at least two sets of eigenvectors in order to use spatial filtering in the context of BMA.
Each element of the list contains a matrix with the extracted eigenvectors, where the matrices do not have to have the same column dimension. In the example
we have collected the eigenvectors in the object WL.boston
. To have a quick look at the boston data set we run a short BMA chain with 1 million posterior draws
(iter=1e06
) after discarding the first 100,000 draws (burn=1e05
). For more information regarding the other function arguments type ?spatFilt.bms
.
dataM = as.matrix(apply(dataM, 2, as.numeric)) model1 = spatFilt.bms(X.data = dataM, WList = WL.boston, burn = 1e+05, iter = 1e+06, nmodel = 100, mcmc = "bd", g = "bric", mprior = "random", mprior.size =(ncol(dataM)-1)/2)
The object model1
is a standard BMS
object. It is important, to note that aall the reported statistics (Posterior Inclusion Probabilities, Posterior Means, Posterior
Standard deviations, etc.) are after having integrated out uncertainty with respect to W. To fix ideas, we will look at the disaggregated results to - for example - assess whether avariable receives only posterior support under a particular weight matrix or to look at theposterior inclusion probabilites of the spatial weight matrices, first:
model1$wTopModels[, 1:5] 08beb0 08b6b0 08feb0 09beb0 08bef0 CRIM 1.0000000 1.0000000 1.00000000 1.00000000 1.00000000 ZN 0.0000000 0.0000000 0.00000000 0.00000000 0.00000000 INDUS 0.0000000 0.0000000 0.00000000 0.00000000 0.00000000 CHAS 0.0000000 0.0000000 0.00000000 1.00000000 0.00000000 AGE 1.0000000 1.0000000 1.00000000 1.00000000 1.00000000 DIS 0.0000000 0.0000000 1.00000000 0.00000000 0.00000000 RAD 1.0000000 1.0000000 1.00000000 1.00000000 1.00000000 TAX 1.0000000 1.0000000 1.00000000 1.00000000 1.00000000 PTRATIO 1.0000000 0.0000000 1.00000000 1.00000000 1.00000000 B 1.0000000 1.0000000 1.00000000 1.00000000 1.00000000 LSTAT 1.0000000 1.0000000 1.00000000 1.00000000 1.00000000 NOX 0.0000000 0.0000000 0.00000000 0.00000000 0.00000000 RM 1.0000000 1.0000000 1.00000000 1.00000000 1.00000000 NOX#NOX 0.0000000 0.0000000 0.00000000 0.00000000 1.00000000 RM#RM 1.0000000 1.0000000 1.00000000 1.00000000 1.00000000 W-Index 1.0000000 1.0000000 1.00000000 1.00000000 1.00000000 PMP (Exact) 0.4700282 0.1026632 0.04539982 0.04329733 0.03871103 PMP (MCMC) 0.4676800 0.1037210 0.04752609 0.04262372 0.03858353
In the example we show posterior results for the first 5 models. The line W-Index tells you which W has been included in the particular model. In our example
the W-Index indicates that the first weight matrix in WL.boston
(i.e. W0='boston.soi') has been used in the first 5 regression models. On the contrary:
topmodels.bma(model1)[, 1:3] 45f5 45b5 47f5 CRIM 1.000000 1.0000000 1.00000000 ZN 0.000000 0.0000000 0.00000000 INDUS 0.000000 0.0000000 0.00000000 CHAS 0.000000 0.0000000 0.00000000 AGE 1.000000 1.0000000 1.00000000 DIS 0.000000 0.0000000 1.00000000 RAD 1.000000 1.0000000 1.00000000 TAX 1.000000 1.0000000 1.00000000 PTRATIO 1.000000 0.0000000 1.00000000 B 1.000000 1.0000000 1.00000000 LSTAT 1.000000 1.0000000 1.00000000 NOX 0.000000 0.0000000 0.00000000 RM 1.000000 1.0000000 1.00000000 NOX#NOX 0.000000 0.0000000 0.00000000 RM#RM 1.000000 1.0000000 1.00000000 PMP (Exact) 0.465327 0.1056594 0.04672478 PMP (MCMC) 0.463265 0.0999770 0.04399500
shows the aggregated results: a matrix of containing the best models along with the according posterior model probabilities (exact and frequencies) after integrating out uncertainty with respect to the weight matrices. That is, if the first two models would be the same in terms of explanatory variables but differ regarding the employed weight matrix, the posterior mass of the two models is aggregated.
estimates.bma(model1) PIP Post Mean Post SD Cond.Pos.Sign Idx B 1.000000 5.662607e-04 9.677364e-05 1.00000000 10 LSTAT 1.000000 -1.794610e-01 1.951711e-02 0.00000000 11 CRIM 0.999997 -4.484776e-03 8.450216e-04 0.00000000 1 RM#RM 0.999995 3.755742e-02 6.807775e-03 1.00000000 15 RM 0.997460 -3.609063e-01 8.758833e-02 0.00000501 13 AGE 0.990832 -1.486823e-03 4.082490e-04 0.00000000 5 RAD 0.973028 6.094649e-02 1.926394e-02 1.00000000 7 TAX 0.943864 -2.723645e-04 1.056064e-04 0.00000000 8 PTRATIO 0.832582 -1.114472e-02 6.486644e-03 0.00000000 9 DIS 0.107883 -4.118105e-03 1.944304e-02 0.00549670 6 CHAS 0.089263 -1.408996e-03 8.414781e-03 0.00000000 4 NOX#NOX 0.085315 -8.218984e-03 9.314296e-02 0.02434507 14 INDUS 0.081432 -4.726262e-05 6.037751e-04 0.06234650 3 NOX 0.081259 2.521705e-03 1.183834e-01 0.23021450 12 ZN 0.078215 4.135538e-06 1.114933e-04 0.82154318 2
In the same vein, the posterior inclusion probability (PIP) for the variable RM
for example corresponds to the sum of the posterior model probabilities of all regression models including that variable and the posterior mean is calculated as the weighted (by posterior model probabilities) average of posterior means over all weight matrices. Also note that the prior over the W space is uniform.
Coming back to the disaggregated (W-specific) results. To calculate the posterior inclusion probabilities of the W matrices, you can look a the frequency with which each W matrix has been visited by the sampler and express this in percentages:
model1$Wcount Col0 Col1 Col2 knn4 knn6 992344 3773 3883 0 0 model1$Wcount/sum(model1$Wcount) * 100 Col0 Col1 Col2 knn4 knn6 99.2344 0.3773 0.3883 0.0000 0.0000
In the example, the original 'boston.soi' W matrix along with its perturbations receives
overwhelming posterior support, whereas the k-nn matrices cannot explain the spatial patterns present in the data. If you prefer statistics based on the best
models (in the example we have set nmodel=100
meaning that results are based on a maximum of 100 models receiving highest posterior support in terms of posterior
model probabilities), you can use the function pmpW.bma
:
pmpW.bma(model1) PMP (Exact) PMP (MCMC) Col0 99.5443044 99.5790237 Col1 0.2278478 0.2064531 Col2 0.2278478 0.2145232 knn4 0.0000000 0.0000000 knn6 0.0000000 0.0000000
Usually model1$Wcount
gives you a reasonable approximation and is directly accessible from the model1
object.
We finally want to check whether there is remaining spatial autocorrelation present in the residuals. For that purpose we can use the 'Moran's I' test calling
lm.morantest
for the best models. Although - in order to save time - we could have a look at the 10 best models only (in the example the first 10 models
already account for more than 80% of posterior mass) we carry out the residual test for all models in order to get more accurate results.
The function mTest
is wrapper for the function lm.morantest
provided in the package spdep
. The loop re-computes the nmodel
best -
in terms of posterior model probabilities - regressions in order to get an object of class lm
and finally applies the lm.morantest
.
We do this once for the eigenvector augmented regressions (the spatial filtering approach) and once in a pure OLS fashion by setting the
option variants="double"
:
mTest = moranTest.bma(object = model1, variants = "double", W = nb2listw(boston.soi), nmodel = 100)
This allows us for a direct comparison of a non-spatial regression approach and the spatial filtering BMA approach pursued in this article.
If the non-spatial linear regression models do not show any patterns of spatial residual autocorrelation a standard BMA regression
- as with the function bms
- dealing solely with uncertainty with respect to the explanatory variables might be preferable.
We now extract the corresponding p-values (remember the null hypothesis corresponds to no spatial autocorrelation):
pvalueMat = cbind(sapply(mTest$moran, function(x) x$p.value), sapply(mTest$moranEV, function(x) x$p.value))
Figure 1 shows the distribution of the p-values of the 'Moran's I' test, once for pure OLS regressions (without any spatial correction, left panel) and once augmented with the eigenvectors identified by the spatial filtering algorithm (right panel).
To sum up by incorporating the eigenvectors we successfully removed spatial residual autocorrelation from the regressions. The BMA framework helped us to explicitely deal
with uncertainty stemming from the construction of the weight matrices and allowed us to get posterior inclusion probabilities of weighting matrices as well as the usual
BMA statistics dealing with spatial correlation.
Finally, once again please note that all functions of the BMA package BMS
are fully applicable to its spatial filtering
variant. The remainder of this illustration shows posterior plots to assess convergence of the MCMC sampler, the posterior distribution of coefficients of interest,
an image plot as well as a plot of the posterior model size. You can also type
spatBMS.demo
to get a short demonstration of spatFilt.bms
's main functions.
Bibliography
- Anselin, Luc, 1988. Spatial Econometrics: Methods and Models. Kluwer Academic Publishers.
- Crespo Cuaresma, Jesús and Feldkircher, Martin. 2010. Spatial Filtering, Model Uncertainty and the Speed of Income Convergence in Europe. Working Papers 160, Oesterreichische Nationalbank.
- Feldkircher, Martin and Zeugner, Stefan. 2009. Benchmark Priors Revisited: On Adaptive Shrinkage and the Supermodel Effect in Bayesian Model Averaging.IMF Working Paper, WP/09/202.
- Fernández, Carmen and Ley, Eduardo and Steel, Mark F.J.. 2001. Benchmark Priors for Bayesian Model Averaging Journal of Econometrics, 100, 381-427.
- Getis, Arthur and Griffth, Daniel A. 2002. Comparative Spatial Filtering in Regression Analysis. Geographical Analysis, 34:2, 130-140.
- Harrison, D. and Rubinfeld, D.L. 1978. Hedonic prices and the demand for clean air. Journal of Environmental Economics & Management, Vol.5, 81-102.
- Tiefelsdorf, Michael and Griffith, Daniel 2007. Semiparametric Filtering of spatial auto-correlation: the eigenvector approach. Environment and Planning A, 37, 1193-1221.
Downloads
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.