Generating ANOVA-like table from GLMM using parametric bootstrap
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This article may also be found on RPubs: http://rpubs.com/hughes/63269
In the list of worst to best way to test for effect in GLMM the list on http://glmm.wikidot.com/faq state that parametric bootstrapping is among the best options. PBmodcomp in the pbkrtest package implement such parametric bootstrapping by comparing a full model to a null one. The function simulate data (the response vector) from the null model then fit these data to the null and full model and derive a likelihood ratio test for each of the simulated data. Then we can compare the observed likelihood ratio test to the null distribution generated from the many simulation and derive a p-value. The advantage of using such a method over the classical p-values derived from a chi-square test on the likelihood ratio test is that in the parametric bootstrap we do not assume any null distribution (like chi-square) but instead derive our own null distribution from the model and the data at hand. We do not make the assumption then that the likelihood ratio test statistic is chi-square distributed. I have made a little function that wraps around the PBmodcomp function to compute bootstrapped p-values for each term in a model by sequentially adding them. This lead to anova-like table that are typically obtained when one use the command anova on a glm object.
#the libraries used library(lme4) library(arm) library(pbkrtest) #the function anova_merMod<-function(model,rand,w=NULL,seed=round(runif(1,0,100),0),nsim=50){ data<-model@frame if(!is.null(w)){ data<-data[,-grep("(weights)",names(data))] } resp<-names(model.frame(model))[1] #generate a list of reduced model formula fs<-list() fs[[1]]<-as.formula(paste(resp,"~ 1 +",rand)) nb_terms<-length(attr(terms(model),"term.labels")) if(nb_terms>1){ for(i in 1:nb_terms){ tmp<-c(attr(terms(model),"term.labels")[1:i],rand) fs[[i+1]]<-reformulate(tmp,response=resp) } } #fit the reduced model to the data fam<-family(model)[1]$family if(fam=="gaussian"){ m_fit<-lapply(fs,function(x) lmer(x,data,REML=FALSE)) } else if(fam=="binomial"){ m_fit<-lapply(fs,function(x) glmer(x,data,family=fam,weights=w)) } else{ m_fit<-lapply(fs,function(x) glmer(x,data,family=fam)) } #compare nested model with one another and get LRT values (ie increase in the likelihood of the models as parameters are added) tab_out<-NULL for(i in 1:(length(m_fit)-1)){ comp<-PBmodcomp(m_fit[[i+1]],m_fit[[i]],seed=seed,nsim=nsim) term_added<-attr(terms(m_fit[[i+1]]),"term.labels")[length(attr(terms(m_fit[[i+1]]),"term.labels"))] #here are reported the bootstrapped p-values, ie not assuming any parametric distribution like chi-square to the LRT values generated under the null model #these p-values represent the number of time the simulated LRT value (under null model) are larger than the observe one tmp<-data.frame(term=term_added,LRT=comp$test$stat[1],p_value=comp$test$p.value[2]) tab_out<-rbind(tab_out,tmp) print(paste("Variable ",term_added," tested",sep="")) } print(paste("Seed set to:",seed)) return(tab_out) }
You pass your GLMM model to the function together with the random part as character (see example below), if you fitted a binomial GLMM you also need to provide the weights as a vector, you can then set a seed and the last argument is the number of simulation to do, it is set by default to 50 for rapid checking purpose but if you want to report these results larger values (ie 1000, 10000) should be used.
Let’s look at a simple LMM example:
data(grouseticks) m<-lmer(TICKS~cHEIGHT+YEAR+(1|BROOD),grouseticks) summary(m) ## Linear mixed model fit by REML ['lmerMod'] ## Formula: TICKS ~ cHEIGHT + YEAR + (1 | BROOD) ## Data: grouseticks ## ## REML criterion at convergence: 2755 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.406 -0.246 -0.036 0.146 5.807 ## ## Random effects: ## Groups Name Variance Std.Dev. ## BROOD (Intercept) 87.3 9.34 ## Residual 28.1 5.30 ## Number of obs: 403, groups: BROOD, 118 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) 5.4947 1.6238 3.38 ## cHEIGHT -0.1045 0.0264 -3.95 ## YEAR96 4.1910 2.2424 1.87 ## YEAR97 -4.3304 2.2708 -1.91 ## ## Correlation of Fixed Effects: ## (Intr) cHEIGH YEAR96 ## cHEIGHT -0.091 ## YEAR96 -0.726 0.088 ## YEAR97 -0.714 0.052 0.518 anova_merMod(model=m,rand="(1|BROOD)") ## [1] "Variable cHEIGHT tested" ## [1] "Variable YEAR tested" ## [1] "Seed set to: 63" ## term LRT p_value ## 1 cHEIGHT 14.55 0.01961 ## 2 YEAR 14.40 0.01961
The resulting table show for each term in the model the likelihood ratio test, which is basically the decrease in deviance when going from the null to the full model and the p value, you may look at the PBtest line in the details of ?PBmodcomp to see how it is computed.
Now let’s see how to use the function with binomial GLMM:
#simulate some binomial data x1<-runif(100,-2,2) x2<-runif(100,-2,2) group<-gl(n = 20,k = 5) rnd.eff<-rnorm(20,mean=0,sd=1.5) p<-1+0.5*x1-2*x2+rnd.eff[group]+rnorm(100,0,0.3) y<-rbinom(n = 100,size = 10,prob = invlogit(p)) prop<-y/10 #fit a model m<-glmer(prop~x1+x2+(1|group),family="binomial",weights = rep(10,100)) summary(m) ## Generalized linear mixed model fit by maximum likelihood (Laplace ## Approximation) [glmerMod] ## Family: binomial ( logit ) ## Formula: prop ~ x1 + x2 + (1 | group) ## Weights: rep(10, 100) ## ## AIC BIC logLik deviance df.resid ## 288.6 299.1 -140.3 280.6 96 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -2.334 -0.503 0.181 0.580 2.466 ## ## Random effects: ## Groups Name Variance Std.Dev. ## group (Intercept) 1.38 1.18 ## Number of obs: 100, groups: group, 20 ## ## Fixed effects: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 0.748 0.287 2.61 0.0092 ** ## x1 0.524 0.104 5.02 5.3e-07 *** ## x2 -2.083 0.143 -14.56 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Correlation of Fixed Effects: ## (Intr) x1 ## x1 0.090 ## x2 -0.205 -0.345 #takes some time anova_merMod(m,rand = "(1|group)",w = rep(10,100)) ## [1] "Variable x1 tested" ## [1] "Variable x2 tested" ## [1] "Seed set to: 98" ## term LRT p_value ## 1 x1 0.0429 0.80392 ## 2 x2 502.0921 0.01961
For binomial model, the model must be fitted with proportion data and a vector of weights (ie the number of binomial trial) must be passed to the ‘w’ argument. Some warning message may pop up at the end of the function, this comes from convergence failure in PBmodcomp, this do not affect the results, you may read the article from the pbkrtest package: http://www.jstatsoft.org/v59/i09/ to understand better where this comes from.
Happy modeling and as Ben Bolker say: “When all else fails, don’t forget to keep p-values in perspective: http://www.phdcomics.com/comics/archive.php?comicid=905 “
Filed under: R and Stat Tagged: bootstrap, GLMM, R
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.