Let’s Party!

Posted on June 6, 2012 by diffuseprior in R bloggers | 0 Comments

[This article was first published on DiffusePrioR » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Exploring whether regression coefficients differ between groups is an important part of applied econometric research, and particularly for research with a policy based objective.

For example, a government in a developing country may decide to introduce free school lunches in an effort to improve childhood health. However, if this treatment is known to only improve the health of boys from the lowest socioeconomic strata, it makes sense that this group should be targeted to receive the treatment, while the additional public resources, which would have been unnecessarily used on the other groups, could be efficiently allocated elsewhere.

There are two conventional approaches to estimating these potentially differing effects. The first involves manually partitioning one’s data and performing separate analysis (i.e. one regression for boys the other for girls). The second involves including interaction terms in the regression model. The inclusion of interaction terms allows for different groups to have different slopes.

Problems with these aforementioned strategies arise when researchers would like to stratify the analysis across many groups. Splitting the analysis into different groups can be both a confusing (triple interaction terms anyone?) and inefficient way to conduct research. Furthermore, the results of stratification across a large number of groups can be somewhat difficult to present in a research paper (think of a table with one hundred result columns).

Thankfully, the party package on Cran offers a neat solution to the above concerns, as the functions in this package offer procedures for model based stratification. Following a model-based approach has the obvious advantage that it avoids unnecessary splitting of data, and can therefore be seen as a more efficient way of analyzing group differences.

The model based approach takes the regression model of interest and partitions the results into groups based on parameter instabilities indicated by structural break tests. More info on such tests is given in Zeileis (2005).

In the below, I provide a simple example of the party package at work. Obviously, I encourage interested users to read both the package vignettes, and associated literature before performing more complicated analysis on real data. Let there be three groups (z). In group 0, the effect of x on y is -0.5, in groups 1 and 2 this effect is +0.5. Based on the below plot, we can see that the model-based recursive partitioning approach both predicts the splits, and also the correct parameter estimates.

rm(list=ls())
library(party)
set.seed(1988)

# set up simulated data
z <- sample(c(0,1,2),2000,replace=T)
z1 <- ifelse(z==1,1,0)
z2 <- ifelse(z==2,1,0)
x <- rnorm(2000,0,1)
y <- 1 + 2*z1 + 2*z2 - 0.5*x + x*z1 + x*z2 + rnorm(2000,0,1) 

# model based partitioning of regression of y~x 
# over groups indicated by z
mod1 <- mob(y ~ x | factor(z))
# nice plot of results
plot(mod1)