Stella Copeland’s Intro to Mixed Models in R
[This article was first published on Noam Ross - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In D-RUG today, Stella Copeland gave a quick introduction to mixed models in R. Here’s the script that she presented:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
setwd() #if needed | |
#a collection of approaches to mixed models in R | |
#this is intended to be a short demo on mixed model approaches | |
#BUT!!!! (Disclaimer) | |
#The examples below are very likely NOT to be the best representations of a mixed model, even for this dataset! | |
data<-read.csv("MixedModelExData.csv",header=T) | |
head(data) #What does this all mean? | |
#ElevCat = Elevation Category, | |
length(levels(data$ElevCat)) #5 elevations | |
#Site Number within Elevation Band | |
length(levels(data$SiteNumber)) #9 sites | |
#Microclimate.Category = Warm Cool Right #3 microclimates | |
levels(data$Microclimate.Category) | |
#PlotNumber | |
length(levels(data$PlotNumber)) #225 plots - 9 plots/micro/site | |
#Subplots = all unique, nested within elevation, site, microclimate.category,PlotNumber | |
length(unique(data$Subplot)) #450 plots - 2 / plot | |
#main point: lots of nesting!! Now - what about data? | |
#Neighbor removal experiment - Neighbor Removal = one subplot / plot | |
#CoverBeforeRemoval & Removed Biomass & Control Cover - potential covariates for the neighbor removal treatment | |
#Response variables - Heightcm (height of individuals), TotalLeaves (# of leaves/plant) | |
#Em2011Bin & Em2012Bin - Emergence in 2011 and 2012 - binary response 1 for emergence, 0 if did not emerge | |
#Babies - # of offspring from 2011 individuals in 2012 | |
#Our predictors of interest are: | |
#Categorical: Elevation, microclimate, neighbor removal | |
#Continuous: Biomass or Cover | |
#Our responses are: | |
#Continuous: Height, Leaves | |
#Binomial: Emergence | |
#Count: Babies | |
#How do we account for random effects of site and plotnumber within site and subplot? | |
#One - if we have a normally distributed response variable this would be a linear mixed model that means you could use: | |
#aov or lmer(lme4 package) | |
#Say we want to know how biomass is affected by microclimate? | |
#Maybe it would be best to average the data by plot beforehand, but we'll ignore that for demonstration purposes: | |
#Also, CoverBeforeRemoval is NOT normally distributed...but for the sake of demonstration... | |
CoverModel<-aov(CoverBeforeRemoval~Microclimate.Category+Error(SiteNumber),data=data) #since we have microclimate occurs within site | |
summary(CoverModel) | |
#CoverModel.withPlot<-aov(CoverBeforeRemoval~Microclimate.Category*ElevCat+Error(SiteNumber/PlotNumber),data=data) #this doesn't work! | |
#What about the generalized linear model case?!! | |
#Binomial, poisson, etc...generalized linear mixed models...lme4 | |
library(lme4) | |
#the structure here is different - random effects are specified in paratheses | |
#glmer and lmer are almost the same - lmer with a family other than guassian will fit a generalized linear model | |
#parantheses in lmer mean that the effect is a random effect, 1/means that the intercept varies, effect/random means the slope varies | |
#some examples at vignette("Implementation","lme4") | |
CoverModel.lmer1<-lmer(CoverBeforeRemoval~Microclimate.Category+(1|SiteNumber/PlotNumber),data=data) #This should fit a linear mixed model by default. | |
summary(CoverModel.lmer1) | |
CoverModel.lmer2<-lmer(CoverBeforeRemoval~Microclimate.Category+(1|SiteNumber),data=data) | |
anova(CoverModel.lmer1,CoverModel.lmer2) #we can compare models to decide on use of random effects | |
#Now let's work with a count variable - number of offspring - that happens to be more or less from a poisson distribution | |
BabiesMod1<-lmer(Babies~NeighborRemoval*Microclimate.Category+(1|SiteNumber/PlotNumber),data=data,family=poisson)# | |
summary(BabiesMod1) | |
BabiesMod2<-lmer(Babies~NeighborRemoval*Microclimate.Category+(1|SiteNumber/Microclimate.Category/PlotNumber),data=data,family=poisson)#Nest microclimate within Site | |
summary(BabiesMod2) | |
#binomial distribution works similarly... | |
EmergMod1<-lmer(Em2011Bin~NeighborRemoval*Microclimate.Category+(1|SiteNumber/PlotNumber),data=data,family=binomial)# | |
summary(EmergMod1) | |
#maybe we wanted to know about the effect of a continuous variable - cover - on the emergence of the control plots | |
EmergControl1<-lmer(Em2011Bin~Microclimate.Category+ElevCat+ControlCover+(1|SiteNumber),data=data[which(data$NeighborRemoval=="No"),], | |
family=binomial)# | |
summary(EmergControl1) | |
#maybe we should worry about overdispersion? there are ways to handle that (see Bolker et. al.) | |
rdev <- sum(residuals(BabiesMod1)^2) | |
mdf <- length(fixef(BabiesMod1)) | |
rdf <- nrow(data)-mdf #residual degrees of freedom | |
rdev/rdf #0.313 which is less than 1...there are other ways of looking at overdispersion, but this is one simple example | |
#so how do we extract confidence intervals and all that good stuff?! | |
#Unfortunately, mcmcsamp in the lmer4 package does not work for poisson distributions (and maybe non-normal distributions in general, for now) | |
library(MCMCglmm) | |
#note that the model structure here is a bit different...random and fixed are separated out | |
sampBabiesMod1 = MCMCglmm(fixed=Babies~NeighborRemoval*Microclimate.Category, random=~SiteNumber, | |
data = data,family="poisson") #generate markov chain monte carlo (mcmc) sample for the parameters of the model, and then provides confidence intervals | |
library(MASS) #according to Bolker et. al. (TREE article), penalized loglikelihood may not be a great approach, but here's an example | |
modPQL<-glmmPQL(Babies~NeighborRemoval*Microclimate.Category,random = ~1|SiteNumber,data=data,family=poisson) | |
#also.... | |
#package glmmML | |
library(glmmML) #maximimum likelihood approaches, fewer options for distributions (family) | |
?glmmML | |
#package glmmADMB - download here...http://glmmadmb.r-forge.r-project.org/ slow to download, and I am not sure how this compared to other packages...but promising! | |
library(glmmADMB) # | |
#Join the r mixed models listserve! | |
#Note that there is a LOT of discussion (in general, and on the R list) about how to calculate p-values/degrees of freedom for mixed models! Buyer beware... | |
#This is a useful consolatidated version of the listserve discussion topics: http://glmm.wikidot.com/faq |
Get the data file for this script here
Stella also recommends this paper by Ben Bolker as a quick introduction to the topic.
To leave a comment for the author, please follow the link and comment on their blog: Noam Ross - R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.