Standard, Robust, and Clustered Standard Errors Computed in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Where do these come from? Since most statistical packages calculate these estimates automatically, it is not unreasonable to think that many researchers using applied econometrics are unfamiliar with the exact details of their computation.
For the purposes of illustration, I am going to estimate different standard errors from a basic linear regression model: , using the fertil2 dataset used in Christopher Baum’s book. Let’s load these data, and estimate a linear regression with the lm function (which estimates the parameters using the all too familiar: least squares estimator.
rm(list=ls()) library(foreign) #load data children <- read.dta("children.dta") # lm formula and data form <- ceb ~ age + agefbrth + usemeth data <- children # run regression r1 <- lm(form, data) # get stand errs > summary(r1) Call: lm(formula = form, data = data) Residuals: Min 1Q Median 3Q Max -6.8900 -0.7213 -0.0017 0.6950 6.2657 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.358134 0.173783 7.815 7.39e-15 *** age 0.223737 0.003448 64.888 < 2e-16 *** agefbrth -0.260663 0.008795 -29.637 < 2e-16 *** usemeth 0.187370 0.055430 3.380 0.000733 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.463 on 3209 degrees of freedom (1148 observations deleted due to missingness) Multiple R-squared: 0.5726, Adjusted R-squared: 0.5722 F-statistic: 1433 on 3 and 3209 DF, p-value: < 2.2e-16
When the error terms are assumed homoskedastic IID, the calculation of standard errors comes from taking the square root of the diagonal elements of the variance-covariance matrix which is formulated:
In practice, and in R, this is easy to do. Estimate the variance by taking the average of the ‘squared’ residuals , with the appropriate degrees of freedom adjustment. Code is below. As you can see, these standard errors correspond exactly to those reported using the lm function.
# get X matrix/predictors X <- model.matrix(r1) # number of obs n <- dim(X)[1] # n of predictors k <- dim(X)[2] # calculate stan errs as in the above # sq root of diag elements in vcov se <- sqrt(diag(solve(crossprod(X)) * as.numeric(crossprod(resid(r1))/(n-k)))) > se (Intercept) age agefbrth usemeth 0.173782844 0.003448024 0.008795350 0.055429804
In the presence of heteroskedasticity, the errors are not IID. Consequentially, it is inappropriate to use the average squared residuals. The robust approach, as advocated by White (1980) (and others too), captures heteroskedasticity by assuming that the variance of the residual, while non-constant, can be estimated as a diagonal matrix of each squared residual. In other words, the diagonal terms in will, for the most part, be different , so the j-th row-column element will be . Once again, in R this is trivially implemented.
# residual vector u <- matrix(resid(r1)) # meat part Sigma is a diagonal with u^2 as elements meat1 <- t(X) %*% diag(diag(crossprod(t(u)))) %*% X # degrees of freedom adjust dfc <- n/(n-k) # like before se <- sqrt(dfc*diag(solve(crossprod(X)) %*% meat1 %*% solve(crossprod(X)))) > se (Intercept) age agefbrth usemeth 0.167562394 0.004661912 0.009561617 0.060644558
Adjusting standard errors for clustering can be important. For example, replicating a dataset 100 times should not increase the precision of parameter estimates. However, performing this procedure with the IID assumption will actually do this. Another example is in economics of education research, it is reasonable to expect that the error terms for children in the same class are not independent.
Clustering standard errors can correct for this. Assume m clusters. Like in the robust case, it is or ‘meat’ part, that needs to be adjusted for clustering. In practice, this involves multiplying the residuals by the predictors for each cluster separately, and obtaining , an m by k matrix (where k is the number of predictors). ‘Squaring’ results in a k by k matrix (the meat part). To get the standard errors, one performs the same steps as before, after adjusting the degrees of freedom for clusters.
# cluster name cluster <- "children" # matrix for loops clus <- cbind(X,data[,cluster],resid(r1)) colnames(clus)[(dim(clus)[2]-1):dim(clus)[2]] <- c(cluster,"resid") # number of clusters m <- dim(table(clus[,cluster])) # dof adjustment dfc <- (m/(m-1))*((n-1)/(n-k)) # uj matrix uclust <- matrix(NA, nrow = m, ncol = k) gs <- names(table(data[,cluster])) for(i in 1:m){ uclust[i,] <- t(matrix(clus[clus[,cluster]==gs[i],k+2])) %*% clus[clus[,cluster]==gs[i],1:k] } # square root of diagonal on bread meat bread like before se <- sqrt(diag(solve(crossprod(X)) %*% (t(uclust) %*% uclust) %*% solve(crossprod(X)))*dfc > se (Intercept) age agefbrth usemeth 0.42485889 0.03150865 0.03542962 0.09435531
For calculating robust standard errors in R, both with more goodies and in (probably) a more efficient way, look at the sandwich package. The same applies to clustering and this paper. However, here is a simple function called ols which carries out all of the calculations discussed in the above.
ols <- function(form, data, robust=FALSE, cluster=NULL,digits=3){ r1 <- lm(form, data) if(length(cluster)!=0){ data <- na.omit(data[,c(colnames(r1$model),cluster)]) r1 <- lm(form, data) } X <- model.matrix(r1) n <- dim(X)[1] k <- dim(X)[2] if(robust==FALSE & length(cluster)==0){ se <- sqrt(diag(solve(crossprod(X)) * as.numeric(crossprod(resid(r1))/(n-k)))) res <- cbind(coef(r1),se) } if(robust==TRUE){ u <- matrix(resid(r1)) meat1 <- t(X) %*% diag(diag(crossprod(t(u)))) %*% X dfc <- n/(n-k) se <- sqrt(dfc*diag(solve(crossprod(X)) %*% meat1 %*% solve(crossprod(X)))) res <- cbind(coef(r1),se) } if(length(cluster)!=0){ clus <- cbind(X,data[,cluster],resid(r1)) colnames(clus)[(dim(clus)[2]-1):dim(clus)[2]] <- c(cluster,"resid") m <- dim(table(clus[,cluster])) dfc <- (m/(m-1))*((n-1)/(n-k)) uclust <- apply(resid(r1)*X,2, function(x) tapply(x, clus[,cluster], sum)) se <- sqrt(diag(solve(crossprod(X)) %*% (t(uclust) %*% uclust) %*% solve(crossprod(X)))*dfc) res <- cbind(coef(r1),se) } res <- cbind(res,res[,1]/res[,2],(1-pnorm(res[,1]/res[,2]))*2) res1 <- matrix(as.numeric(sprintf(paste("%.",paste(digits,"f",sep=""),sep=""),res)),nrow=dim(res)[1]) rownames(res1) <- rownames(res) colnames(res1) <- c("Estimate","Std. Error","t value","Pr(>|t|)") return(res1) } # with data as before > ols(ceb ~ age + agefbrth + usemeth,children) Estimate Std. Error t value Pr(>|t|) (Intercept) 1.358 0.174 7.815 0.000 age 0.224 0.003 64.888 0.000 agefbrth -0.261 0.009 -29.637 2.000 usemeth 0.187 0.055 3.380 0.001 > ols(ceb ~ age + agefbrth + usemeth,children,robust=T) Estimate Std. Error t value Pr(>|t|) (Intercept) 1.358 0.168 8.105 0.000 age 0.224 0.005 47.993 0.000 agefbrth -0.261 0.010 -27.261 2.000 usemeth 0.187 0.061 3.090 0.002 > ols(ceb ~ age + agefbrth + usemeth,children,cluster="children") Estimate Std. Error t value Pr(>|t|) (Intercept) 1.358 0.425 3.197 0.001 age 0.224 0.032 7.101 0.000 agefbrth -0.261 0.035 -7.357 2.000 usemeth 0.187 0.094 1.986 0.047
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.