Exercise in REML/Mixed model
[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I want to build a bit more experience in REML, so I decided to redo some of the SAS examples in R. This post describes the results of example 59.1 (page 5001, SAS(R)/STAT User guide 12.3 link). Following the list from freshbiostats I will analyze using lme4 and MCMCglm.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Data
The data is a split plot design. To quote ‘The split-plot design involves two experimental factors, A and B. Levels of A are randomly assigned to whole plots (main plots), and levels of B are randomly assigned to split plots (subplots) within each whole plot.’. I will read the data in r1 then convert to a suitable data.frame sp.r1 <- read.table(textConnection('
1 1 1 56 1 1 2 41
1 2 1 50 1 2 2 36
1 3 1 39 1 3 2 35
2 1 1 30 2 1 2 25
2 2 1 36 2 2 2 28
2 3 1 33 2 3 2 30
3 1 1 32 3 1 2 24
3 2 1 31 3 2 2 27
3 3 1 15 3 3 2 19
4 1 1 30 4 1 2 25
4 2 1 35 4 2 2 30
4 3 1 17 4 3 2 18′),
col.names=c(‘Block1′,’A1′,’B1′,’Y1′,’Block2′,’A2′,’B2′,’Y2’))
sp <- with(r1,data.frame(
Block=factor(c(Block1,Block2)),
A=factor(c(A1,A2)),
B=factor(c(B1,B2)),
Y=c(Y1,Y2)))
Analysis
I won’t repeat the preliminary part (number of levels, number of observations etc.). It is not the R philosophy to have that within the mixed model. Hence it remains to calculate variances, determe the significance of fixed effects and a mean with standard deviation for the first level of factor a.
lme4
The model summary gives same variances for the random effects, same residual log likelihood (‘-2 Res Log Likelihood’ in proc mixed is ‘REMLdev’ in lme4), different values for AIC, BIC. lme4 does not by default give tests for fixed effects.
l1 <- lmer(Y ~ A + B + A*B + (1 |Block) + ( 1| A : Block) ,data=sp)
summary(l1)
Linear mixed model fit by REML
Formula: Y ~ A + B + A * B + (1 | Block) + (1 | A:Block)
Data: sp
AIC BIC logLik deviance REMLdev
137.8 148.4 -59.88 141.7 119.8
Random effects:
Groups Name Variance Std.Dev.
A:Block (Intercept) 15.3819 3.9220
Block (Intercept) 62.3958 7.8991
Residual 9.3611 3.0596
Number of obs: 24, groups: A:Block, 12; Block, 4
Fixed effects:
Estimate Std. Error t value
(Intercept) 37.000 4.667 7.928
A2 1.000 3.517 0.284
A3 -11.000 3.517 -3.127
B2 -8.250 2.163 -3.813
A2:B2 0.500 3.060 0.163
A3:B2 7.750 3.060 2.533
Correlation of Fixed Effects:
(Intr) A2 A3 B2 A2:B2
A2 -0.377
A3 -0.377 0.500
B2 -0.232 0.308 0.308
A2:B2 0.164 -0.435 -0.217 -0.707
A3:B2 0.164 -0.217 -0.435 -0.707 0.500
Test for fixed effects
The example gives tests for all fixed effects. I am following the school where testing a main effect when it is present in an interaction has no meaning, so I will only test the A*B interaction. There are two ways to go about this; compare hierarchical models via anova and simulation. Proc mixed has p-value 0.0566. Anova gave 0.0204, simulation 0.068.
l2 <- lmer(Y ~ A + B + (1 |Block) + ( 1| A : Block) ,data=sp)
anova(l2,l1)
Data: sp
Models:
l2: Y ~ A + B + (1 | Block) + (1 | A:Block)
l1: Y ~ A + B + A * B + (1 | Block) + (1 | A:Block)
Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
l2 7 163.47 171.71 -74.734
l1 9 159.69 170.29 -70.844 7.7796 2 0.02045 *
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
pboot <- function(m0,m1) {
s <- simulate(m0)
L0 <- logLik(refit(m0,s))
L1 <- logLik(refit(m1,s))
2*(L1-L0)
}
obsdev <- 2*( as.numeric(logLik(l1))-as.numeric(logLik(l2)))
set.seed(1001)
reps <- replicate(1000,pboot(l2,l1))
mean(reps>obsdev)
[1] 0.068
mean for a level 1
The example provides the mean with three standard errors, depending on the inference space. The broad inference space is of interest, mean 32.875, sd=4.54. In lme4 we will run this as another simulation. The sd is a bit smaller.
mcs <- mcmcsamp(l1,10000)
mcsdf <- as.data.frame(mcs)
c(mean=mean(mcsdf[,1]+.5*mcsdf[,’B2′]),sd=sd(mcsdf[,1]+.5*mcsdf[,’B2′]))
mean sd
32.913643 3.459708
MCMCglm
MCMCglm has a different perspective, for one, it is Bayesian However this does not preclude us from looking at the data. It seems the variances are completely different, much larger. More on this later. Fixed effects look quite like the ones from lme4.
library(MCMCglmm)
m1 <- MCMCglmm(Y ~ A + B + A*B, random= ~Block + A : Block ,
data=sp,family=’gaussian’,nitt=500000,thin=20,
burnin=50000,verbose=FALSE)
summary(m1)
Iterations = 50001:499981
Thinning interval = 20
Sample size = 22500
DIC: 145.191
G-structure: ~Block
post.mean l-95% CI u-95% CI eff.samp
Block 149.8 1.985e-17 459.8 21784
~A:Block
post.mean l-95% CI u-95% CI eff.samp
A:Block 25.99 5.336e-17 120 374.1
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 18.81 3.764 39.78 606.9
Location effects: Y ~ A + B + A * B
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) 36.9516 23.9439 49.5868 22500 0.00213 **
A2 0.9780 -8.4952 10.5917 24190 0.80222
A3 -11.0647 -20.8373 -1.5902 22500 0.03076 *
B2 -8.2458 -14.2918 -2.0896 22991 0.01413 *
A2:B2 0.5208 -7.8395 9.7543 22500 0.89973
A3:B2 7.7596 -1.0876 16.2564 22500 0.07511 .
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
Test for fixed effects
Hypothesis tests do not really exist in this context. However, one can have a look if 0,0 for the terms A2:B2 and A3:B2 at the same time is probable. Both these terms are positive, but it seems that 3% of the samples have both terms negative.
table(sign(m1$Sol[,’A2:B2′]),sign(m1$Sol[,’A3:B2′]))/nrow(m1$Sol)
-1 1
-1 0.03195556 0.41791111
1 0.00560000 0.54453333
mean for a level 1
This can be pulled from the fixed effects samples. The standard deviation is a bit larger than from proc mixed and lme4.
c(mean=mean(m1$Sol[,1]+.5*m1$Sol[,’B2′]),
sd=sd(m1$Sol[,1]+.5*m1$Sol[,’B2′]))
mean sd
32.828740 6.655951
MCMC variances discussion
MCMCglm had strange results for variations. For further examination I drew some quantiles.
The first concerns blocks, the posterior mean of 150 is way of from 62 which both lme4 and proc mixed give. Still, the median seems reasonable.
The second is A:block, 15 from proc mixed and lme4. Many, many MCMC samples have very low values, showing to a different solution. But there are also 10% samples over 85.
Third is units, proc mixed and lme4 have 9.4, most of the samples are much larger.
lapply(1:3,function(i) quantile(m1$VCV[,i],seq(.1,.9,.2)))
[[1]]
10% 30% 50% 70% 90%
6.036660e-08 2.804703e+01 5.683922e+01 1.050030e+02 2.735606e+02
[[2]]
10% 30% 50% 70% 90%
6.975880e-13 4.202219e-06 2.628300e+00 2.210180e+01 8.511472e+01
[[3]]
10% 30% 50% 70% 90%
7.104689 11.608162 16.596443 22.304393 32.750846
It almost seems as MCMCglm finds samples where A:blocks is absent, while units get a much higher variation. Obviously, this alternative model cn be run in lme4, and compared with the first model. The A:Blocks term is not significant, which makes this explanation less plausible.
l3 <- lmer(Y ~ A + B + A*B + (1 |Block) ,data=sp)
summary(l3)
Linear mixed model fit by REML
Formula: Y ~ A + B + A * B + (1 | Block)
Data: sp
AIC BIC logLik deviance REMLdev
139.6 149 -61.81 146.8 123.6
Random effects:
Groups Name Variance Std.Dev.
Block (Intercept) 65.472 8.0915
Residual 21.667 4.6547
Number of obs: 24, groups: Block, 4
Fixed effects:
Estimate Std. Error t value
(Intercept) 37.000 4.667 7.928
A2 1.000 3.291 0.304
A3 -11.000 3.291 -3.342
B2 -8.250 3.291 -2.507
A2:B2 0.500 4.655 0.107
A3:B2 7.750 4.655 1.665
Correlation of Fixed Effects:
(Intr) A2 A3 B2 A2:B2
A2 -0.353
A3 -0.353 0.500
B2 -0.353 0.500 0.500
A2:B2 0.249 -0.707 -0.354 -0.707
A3:B2 0.249 -0.354 -0.707 -0.707 0.500
anova(l1,l3)
Data: sp
Models:
l3: Y ~ A + B + A * B + (1 | Block)
l1: Y ~ A + B + A * B + (1 | Block) + (1 | A:Block)
Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
l3 8 162.83 172.25 -73.414
l1 9 159.69 170.29 -70.844 5.1407 1 0.02337 *
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
obsdev13 <- 2*( as.numeric(logLik(l1))-as.numeric(logLik(l3)))
reps13 <- replicate(1000,pboot(l3,l1))
mean(reps13>obsdev13)
[1] 0.027
To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.