Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Over the past number of years, I have noted that spatial econometric methods have been gaining popularity. This is a welcome trend in my opinion, as the spatial structure of data is something that should be explicitly included in the empirical modelling procedure. Omitting spatial effects assumes that the location co-ordinates for observations are unrelated to the observable characteristics that the researcher is trying to model. Not a good assumption, particularly in empirical macroeconomics where the unit of observation is typically countries or regions.
Starting out with the prototypical linear regression model:
The spatial lag model is of the form:
The spatial weights matrix should be specified by the researcher. For example, let us have a dataset that consists of 3 observations, spatially located on a 1-dimensional Euclidean space wherein the first observation is a neighbour of the second and the second is a neighbour of the third. The spatial weights matrix for this dataset should be a
While the spatial-lag model represents a modified version of the basic linear regression model, estimation via OLS is problematic because the spatially lagged variable
Fortunately, remedying these biases is relatively simple, as a number of estimators exist that will yield an unbiased estimate of the spatial lag, and consequently the coefficients for the other explanatory variables—assuming, of course, that these explanatory variables are themselves exogenous. Here, I will consider two: the Maximum Likelihood estimator (denoted ML) as described in Ord (1975), and a generalized two-stage least squares regression model (2SLS) wherein spatial lags, and spatial lags lags (i.e.
To examine the properties of these four estimators, I run a Monte Carlo experiment. First, let us assume that we have 225 observations equally spread over a
Taking our spatial weights matrix as defined, we want to simulate the following linear model:
Performing 1,000 repetitions of the above simulation permits us to examine the distributions of the coefficient estimates produced by the four models outlined in the above. The distributions of these coefficients are displayed in the graphic in the beginning of this post. The spatial autocorrelation parameter
To perform these calculations I used the spdep package in R, with the graphic created via ggplot2. Please see the R code I used in the below.
library(spdep) ; library(ggplot2) ; library(reshape) rm(list=ls()) n = 225 data = data.frame(n1=1:n) # coords data$lat = rep(1:sqrt(n), sqrt(n)) data$long = sort(rep(1:sqrt(n), sqrt(n))) # create W matrix wt1 = as.matrix(dist(cbind(data$long, data$lat), method = "euclidean", upper=TRUE)) wt1 = ifelse(wt1==1, 1, 0) diag(wt1) = 0 # row standardize rs = rowSums(wt1) wt1 = apply(wt1, 2, function(x) x/rs) lw1 = mat2listw(wt1, style="W") rx = 0.25 rho = 0.4 b1 = 0.5 b2 = -0.5 b3 = 1.75 alp = 0.2 inv1 = invIrW(lw1, rho=rx, method="solve", feasible=NULL) inv2 = invIrW(lw1, rho=rho, method="solve", feasible=NULL) sims = 1000 beta1results = matrix(NA, ncol=4, nrow=sims) beta2results = matrix(NA, ncol=4, nrow=sims) beta3results = matrix(NA, ncol=4, nrow=sims) rhoresults = matrix(NA, ncol=3, nrow=sims) for(i in 1:sims){ u1 = rnorm(n) x2 = inv1 %*% u1 u2 = rnorm(n) x3 = inv1 %*% u2 v1 = rnorm(n) e1 = alp*v1 + rnorm(n) data1 = data.frame(cbind(x2, x3),lag.listw(lw1, cbind(x2, x3))) names(data1) = c("x2","x3","wx2","wx3") data1$y1 = inv2 %*% (b1 + b2*x2 + b3*x3 + rho*v1 + e1) data1$wy1 = lag.listw(lw1, data1$y1) data1$w2x2 = lag.listw(lw1, data1$wx2) data1$w2x3 = lag.listw(lw1, data1$wx3) data1$w3x2 = lag.listw(lw1, data1$w2x2) data1$w3x3 = lag.listw(lw1, data1$w2x3) m1 = coef(lm(y1 ~ x2 + x3, data1)) m2 = coef(lm(y1 ~ wy1 + x2 + x3, data1)) m3 = coef(lagsarlm(y1 ~ x2 + x3, data1, lw1)) m4 = coef(stsls(y1 ~ x2 + x3, data1, lw1)) beta1results[i,] = c(m1[1], m2[1], m3[2], m4[2]) beta2results[i,] = c(m1[2], m2[3], m3[3], m4[3]) beta3results[i,] = c(m1[3], m2[4], m3[4], m4[4]) rhoresults[i,] = c(m2[2],m3[1], m4[1]) } apply(rhoresults, 2, mean) ; apply(rhoresults, 2, sd) apply(beta1results, 2, mean) ; apply(beta1results, 2, sd) apply(beta2results, 2, mean) ; apply(beta2results, 2, sd) apply(beta3results, 2, mean) ; apply(beta3results, 2, sd) colnames(rhoresults) = c("OLS2","ML","2SLS") colnames(beta1results) = c("OLS1","OLS2","ML","2SLS") colnames(beta2results) = c("OLS1","OLS2","ML","2SLS") colnames(beta3results) = c("OLS1","OLS2","ML","2SLS") rhoresults = melt(rhoresults) rhoresults$coef = "rho" rhoresults$true = 0.4 beta1results = melt(beta1results) beta1results$coef = "beta1" beta1results$true = 0.5 beta2results = melt(beta2results) beta2results$coef = "beta2" beta2results$true = -0.5 beta3results = melt(beta3results) beta3results$coef = "beta3" beta3results$true = 1.75 data = rbind(rhoresults,beta1results,beta2results,beta3results) data$Estimator = data$X2 ggplot(data, aes(x=value, colour=Estimator, fill=Estimator)) + geom_density(alpha=.3) + facet_wrap(~ coef, scales= "free") + geom_vline(aes(xintercept=true)) + scale_y_continuous("Density") + scale_x_continuous("Effect Size") + opts(legend.position = 'bottom', legend.direction = 'horizontal')
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.