Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This tutorial illustrates how to use Bayesian Model Averaging (BMA) with panel data using the R
package BMS.
Contents
- Introduction
- Fixed Effects Estimation by Demeaning the Data
- Fixed Effects Estimation with Dummy Variables
- Bibliography
- Downloads
A pdf version of this tutorial is available here.
Introduction
Methods for estimating econometric models with panel data have been frequently discussed in the literature (see eg Mundlak, 1978).
Two estimators seem to resurface most often: the fixed effects estimator (FE) and the random effects estimator (RE). Both have their separate virtues and underlying assumptions (for an exposition see
Bartels, 2008). Since the FE estimator can be easily cast into the linear regression framework that is used for the BMA package BMS
, it will be our focus in this tutorial. For an application of Bayesian Model
Averaging employing the RE estimator please refer to Moral-Benito (2011). Furthermore, a great deal of the literature seems to pivot around the question of how to calculate standard errors (Bartels, 2008). At the current stage, we abstract from the calculation of so-called clustered standard errors since we stay in a pure Bayesian framework in BMS
(and standard errors are a classical concept).
For the purpose of illustration we will use the data put forward in
Moral-Benito (2011). The data contains K=35
variables (including the dependent variable, the growth rate of per capita GDP) for N=73
countries and for the period 1960-2000. The appendix lists the variables
together with a short description. The dependent variable, GDP growth, is calculated for five year averages resulting into eight observations per country. Moral-Benito (2011) argues in favor of calculating averages of
flow variables, while stock variables have been measured at the first year of each five-year period. The data can be downloaded at Moralbenito.com (‘download data for the paper
Determinants of Economic Growth: A Bayesian Panel Data Approach and unzip).
library(BMS)After having started
R
and loaded the BMS
package we can read in the data file:
data.raw=read.table("Dataset.csv",header=TRUE,sep=";") panelDat=data.raw[,-c(1:4)] rownames(panelDat)=paste(data.raw[,"code"],data.raw[,"year"],sep="_") panelDat=panelDat[,-charmatch(c("GRW_WB","GDP_WB"),colnames(panelDat))]
For that purpose I have simply converted the ‘Dataset.xls’ file to a ‘Dataset.csv’ file, replaced all missing values by ‘NA’ in excel and put it online. You can also therefore directly download the data with the command:
< !-- data here paneldat.rda –>load(url("http://bms.zeugner.eu/blog/images/2011-05/paneldat.rda"))
Let us now peek into the data with the following commands:
panelDat=as.matrix(panelDat) head(panelDat)
The rownames of the data are a combination of the country code and the year of the observation. The data is provided in a data.frame
consisting of stacked observations per column.
That is, the first column containing the dependent variable consists
Y1 = (y1,1,y1,T=8…,yN,1,…,yN,T)
This is also the format we can use
later on when calling the
bms
function.
We will use two approaches to estimate a fixed effects (FE) panel (country / time fixed effects). The first approach makes use of the Frisch-Waugh-Lovell theorem (see eg Lovell, 2008), boiling down to demeaning the data accordingly. That is, in the case of country fixed effects, subtract from each observation (dependent and independent variable) the within country mean. For the case of time fixed effects, subtract from each observation the mean across countries per time period. We will start with the country fixed effects first.
For that purpose we will have to re-shape the data frame and put it into the form of a three dimensional array (T x K x N
). That can be achieved with the function panel_unstack
. Since
bms
uses data in its stacked form, we have to make use of panel_stack
as well. Both functions are not part of the BMS
library and thus have to be copied and pasted into your R
console by yourself:
panel_unstack= function(stackeddata, tstep=NULL) { # stackeddata is a stacked data frame/matrix ordered in the following way # Variable1 Variable2 # ctry1_time1 # ctry1_time2 # ... # ctry2_time1 # ctry2_time2 # ... # tstep is the number of time points # panel_unstack produces a 3-dimensional array out of that bigT=nrow(stackeddata);K=ncol(stackeddata); if (is.null(tstep)) tstep=bigT X1=aperm(array(as.vector(t(as.matrix(stackeddata))),dim=c(K,tstep,bigT/tstep)), perm=c(2,1,3)) try(dimnames(X1)[[1]] <- unique(sapply(strsplit(rownames(stackeddata),"_"), function(x) x[[2]])), silent=TRUE) try(dimnames(X1)[[2]] <- colnames(stackeddata), silent=TRUE) try(dimnames(X1)[[3]] <- unique(sapply(strsplit(rownames(stackeddata),"_"), function(x) x[[1]])), silent=TRUE) return(X1) } panel_stack = function(array3d) { x1= apply(array3d,2,rbind) try(rownames(x1) <- as.vector(sapply(dimnames(array3d)[[3]], FUN=function(x) paste(x, dimnames(array3d)[[1]], sep="_"))), silent=TRUE) return(as.data.frame(x1)) }
We can now easily transform the data from its stacked form into the three-dimensional array via:
dat.array=panel_unstack(panelDat, tstep=8)
where we have set tstep=8
since we have eight time periods per country. The advantages of the three-dimensional array are that we can easily access each dimension of the data:
dat.array[,,"ZWE"] dat.array["1965",,] dat.array[,"GSH",]
Fixed Effects Estimation by Demeaning the Data
The function demean
(again not part of the BMS
library, so copy and paste the following lines into your R
console) demeans the data to estimate individual (eg country), time and
individual and time fixed effects. It takes as argument the three dimensional data array we have created above (dat.array
) and via margin
we can specify over which dimension we want to
demean the data (country / time).
demean = function(x, margin) { #x is an array #margin is the dimension along which should be demeaned if (!is.array(x)) stop("x must be an array/matrix") otherdims=(1:length(dim(x)))[-margin] sweep(x,otherdims,apply(x,otherdims,mean)) }Demeaning is now easily accomplished by:
timeDat=panel_stack(demean(dat.array,3)) countryDat=panel_stack(demean(dat.array,1))where we have used
panel_stack
to re-transform the demeaned data into its stacked form that can be passed to the bms
function.
Since in the data frame only the first 12 explanatory variable show variation over time, we will restrict estimation to these variables only.
# only the first 12 variables show time variation, so estimate panel with those regressors only modelCd=bms(countryDat[,1:13],user.int=F) modelTd=bms(timeDat[,1:13],user.int=F)
We will briefly discuss the results in the next section (see Table 1 and Table 2). Note that demeaning the data yields the same posterior estimates for the coefficients as with incorporating the FE directly, the approach we opt for in the next section. However, the posterior variance for the coefficient estimates is not identical (though very similar). Also note that demeaning does not save you from the degrees of freedom problem when incorporating the large set of fixed effects by a set of dummy variables. For an application using BMA with country FEs see for example Crespo Cuaresma et al. (2009).
Fixed Effects Estimation with Dummy Variables
We will now turn to the second possibility of estimating FEs, which is the dummy variable approach. The advantage of the dummy variable approach is also that it yields estimates for the FEs which can be important for some applications. For the dummy approach we will make use of the newbms
feature of holding variables constant (not sampling) them by the bms
argument fixed.reg
. Please make sure that you have installed a BMS
version >= 0.3. We start now with creating the country dummies:
# country dummies bigT=nrow(panelDat);tstep=8; countryDummies=kronecker(diag(bigT/tstep),rep(1,tstep)); colnames(countryDummies)=dimnames(dat.array)[[3]] # we have to leave one out (first column in this example, but of course is not relevant which column we leave out) countryDummies=countryDummies[,-1]In a same fashion we can easily create a set of time dummies:
# time series dummies timeDummies=matrix(diag(tstep),bigT,tstep,byrow=T); colnames(timeDummies)=dimnames(dat.array)[[1]] # we have to leave one out (first column in this example, but of course is not relevant which column we leave out) timeDummies=timeDummies[,-1] modelTdummy=bms(cbind(panelDat[,1:13],timeDummies),fixed.reg=colnames(timeDummies),user.int=F)Running the two regressions (for the first 13 elements of the demeaned data frame only):
modelCdummy=bms(cbind(panelDat[,1:13],countryDummies),fixed.reg=colnames(countryDummies),user.int=F) modelTdummy=bms(cbind(panelDat[,1:13],timeDummies),fixed.reg=colnames(timeDummies),user.int=F)
should yield the same results as with demeaning. Type coef(modelCd)
and coef(modelCdummy)
to get the results in R
. These are summarized in the Table below:
PIP Post Mean Post SD | PIP Post Mean Post SD GDP_PWT 1.000 -0.244 0.025 |1.000 -0.244 0.025 POP 1.000 0.000 0.000 |1.000 0.000 0.000 PGRW 0.286 -0.534 0.994 |0.286 -0.534 0.994 IPR 0.327 0.000 0.000 |0.327 0.000 0.000 OPEM 0.999 0.156 0.032 |0.999 0.156 0.032 CSH 0.999 -0.296 0.066 |0.999 -0.296 0.066 GSH 0.981 -0.458 0.139 |0.980 -0.458 0.139 ISH 0.523 0.133 0.148 |0.522 0.133 0.148 LBF 0.510 0.282 0.321 |0.509 0.281 0.321 LEX 0.129 0.000 0.001 |0.129 0.000 0.001 PDE 0.070 0.000 0.000 |0.070 0.000 0.000 URBP 0.083 -0.006 0.043 |0.083 -0.006 0.043 Table 1: Estimation of country fixed effects: Left panel based on demeaning the data, right panel on the dummy variable estimation approach
As one can see the results are very similar to each other. Since we have used 'full enumeration' no stochastic variability should be expected for the two approaches.
However, when using large data sets and thus the MCMC sampler in turn, please bear in mind that there might be some stochastic variation of results when running differen bms
chains.
Posterior coefficients for the model employing country fixed effects are to be interpreted with respect to the within variation: A positive
coefficient on the variable measuring the country's openness (OPEM
) means that if openness increases within a country GDP growth is incraesing. On the other hand, time fixed effects in
the particular example look at the between variation of the data. That is, if openness across countries (at once) increases, does this affect GDP growth? From Table 2 (equivalent to coef(modelTd); coef(modelTdummy)
we see that this effect is smaller compared to that for the within transformed data.
PIP Post Mean Post SD PIP Post Mean Post SD GDP_PWT 1.000 -0.071 0.013 1.000 -0.071 0.013 POP 0.433 0.000 0.000 0.432 0.000 0.000 PGRW 0.994 -2.542 0.647 0.994 -2.542 0.647 IPR 0.771 0.000 0.000 0.770 0.000 0.000 OPEM 0.376 0.013 0.019 0.376 0.013 0.019 CSH 0.095 -0.004 0.016 0.095 -0.004 0.016 GSH 0.136 -0.012 0.039 0.136 -0.012 0.039 ISH 0.864 0.212 0.113 0.864 0.212 0.113 LBF 0.060 0.001 0.026 0.060 0.001 0.026 LEX 1.000 0.007 0.001 1.000 0.007 0.001 PDE 0.347 0.000 0.000 0.347 0.000 0.000 URBP 0.103 -0.005 0.021 0.102 -0.005 0.021 Table 2: Estimation of time fixed effects: Left panel based on demeaning the data, right panel on the dummy variable estimation approach
Bibliography
- Bartels, Brandon (2008). Beyond 'Fixed versus Random Effects': A Framework for Improving Substantive and Statistical Analysis of Panel, Time-Series Cross-Sectional, and Multilevel Data. Mimeo, Stony Brook University, New York.
- Crespo Cuaresma, J. and Doppelhofer, G. and Feldkircher, M. 2009: The Determinants of Economic Growth in European Regions. CESifo Working Paper Series, No. 2519.
- Lovell, M., 2008. A Simple Proof of the FWL (Frisch,Waugh,Lovell) Theorem. Journal of Economic Education.
- Moral Benito, Enrique (2011). Determinants of Economic Growth: A Bayesian Panel Data Approach. The Review of Economics and Statistics, forthcoming (working paper version)
- Mundlak, Yair, 1978. On the Pooling of Time Series and Cross Section Data. Econometrica, Vol. 46, p. 69-85.
Downloads
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.