Site icon R-bloggers

Handling hierarchical data structure in R

[This article was first published on Shige's Research Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
R has a comprehensive set of tools for the handling of hierarchical data structure. The most widely used package is probably “nlme” and “lme4”, contributed by Douglas Bates and colleagues. While “nlme” is older and probably more mature than “lme4”, it seems that the latter is increasingly taking over the former because it comes with full set of tools to handle non-Gaussian distributions.

Based on “nlme” and “lme4”, the package “mgcv” and “gamm4” can estimate generalized additive model on multilevel data structure. This is especially useful if one have continuous covariates such as age or calendar year, whose effects are not very well understood in the literature and grouping may distort the results.

If you don’t feel comfortable with the assumption that the unobserved heterogeneity components are normally distributed, then the package “npmlreg” can be used to conduct sensitivity analysis through nonparametric maximum likelihood method.

The new “MCMCglmm” package extends R’s modeling capacity to multilevel and multivariate models such as those that can be estimated using aML and Sabre. Sure, MCMCglmm relies on MCMC instead of maximum likelihood as the method of estimation, which may be slower especially when the data is huge (as is often the case in demographic studies), but as the author indicated, this package has already gained significant performance advantage over traditional Bayesian packages such as WinBUGS and JAGS, and is almost as general as the latter, with respect to multilevel regression-like models.

The Sabre team also produced a R wrapper of their package for multilevel multiprocess modeling. Unfortunately, their R package is based on an outdated version (v.5 instead of v.6) of Sabre and the project seems to be halted due to lack of fund.

It is not difficult to conduct Bayesian multilevel analysis in R using WinBUGS and JAGS via packages such as “R2WinBUGS” and “R2jags”. These packages can estimate a very large number of models. This means that almost anything one can think of can be model with R.

Another package, ADMB, also has several interfaces to R. I hope they are getting better and better because ADMB is such a powerful and versatile package that, if one can write down the likelihood function, it can maximize it for you.

To leave a comment for the author, please follow the link and comment on their blog: Shige's Research Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.