Negative Binomial Regression for Complex Samples (Surveys) #rstats
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The survey-package from Thomas Lumley is a great toolkit when analyzing complex samples. It provides svyglm()
, to fit generalised linear models to data from a complex survey design. svyglm()
covers all families that are also provided by R’s glm()
– however, the survey-package has no function to fit negative binomial models, which might be useful for overdispersed count models. Yet, the package provides a generic svymle()
to fit user-specified likelihood estimations. In his book, Appendix E, Thomas Lumley describes how to write your own likelihood-function, passed to svymle()
, to fit negative binomial models for complex samples. So I wrote a small „wrapper“ and implemented a function svyglm.nb()
in my sjstats-package.
# ------------------------------------------ # This example reproduces the results from # Lumley 2010, figure E.7 (Appendix E, p256) # ------------------------------------------ library(sjstats) library(survey) data(nhanes_sample) # create survey design des <- svydesign( id = ~ SDMVPSU, strat = ~ SDMVSTRA, weights = ~ WTINT2YR, nest = TRUE, data = nhanes_sample ) # fit negative binomial regression fit <- svyglm.nb(total ~ factor(RIAGENDR) * (log(age) + factor(RIDRETH1)), des) # print coefficients and standard errors round(cbind(coef(fit), survey::SE(fit)), 2) #> [,1] [,2] #> theta.(Intercept) 0.81 0.05 #> eta.(Intercept) 2.29 0.16 #> eta.factor(RIAGENDR)2 -0.80 0.18 #> eta.log(age) 1.07 0.23 #> eta.factor(RIDRETH1)2 0.08 0.15 #> eta.factor(RIDRETH1)3 0.09 0.18 #> eta.factor(RIDRETH1)4 0.82 0.30 #> eta.factor(RIDRETH1)5 0.06 0.38 #> eta.factor(RIAGENDR)2:log(age) -1.22 0.27 #> eta.factor(RIAGENDR)2:factor(RIDRETH1)2 -0.18 0.26 #> eta.factor(RIAGENDR)2:factor(RIDRETH1)3 0.60 0.19 #> eta.factor(RIAGENDR)2:factor(RIDRETH1)4 0.06 0.37 #> eta.factor(RIAGENDR)2:factor(RIDRETH1)5 0.38 0.44
The functions returns an object of class svymle
, so all methods provided by the survey-package for this class work – it’s just that there are only a few, and common methods like predict()
are currently not implemented. Maybe, hopefully, future updates of the survey-package will include such features.
Tagged: R, rstats, Statistik
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.