[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continuing my exploration of mixed models, I now understand what is happening in the second SAS(R)/STAT example for proc mixed (page 5007 of the SAS/STAT 12.3 Manual). It is all about correlation between the time-points within subjects. The data as such is simple, size measurements of children at ages 8, 10, 12 and 14. The subject of analysis is growth. There is correlation between the data at different ages. Then there are the usual subjects; Random subjects in intercept, fixed effects for gender and age effect and the interaction age*gender.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
To summarize the results. With lme I get the fixed structure of the first two models, but the R structure has too high values. With MCMCglm I only was able to create the first model. Here the R structure was quite different, but equivalent to yet another lme model. For interpretation purposes, the fixed model parts are the same. The difference between the two lme models escapes me for now.
Data
Data from SAS/STAT guide. To have the base levels as the SAS analysis there is a relevel for Gender, other than that it is straightforward reading data and a reshape().
r1 <- read.table(textConnection(‘
1 F 21.0 20.0 21.5 23.0
2 F 21.0 21.5 24.0 25.5
3 F 20.5 24.0 24.5 26.0
4 F 23.5 24.5 25.0 26.5
5 F 21.5 23.0 22.5 23.5
6 F 20.0 21.0 21.0 22.5
7 F 21.5 22.5 23.0 25.0
8 F 23.0 23.0 23.5 24.0
9 F 20.0 21.0 22.0 21.5
10 F 16.5 19.0 19.0 19.5
11 F 24.5 25.0 28.0 28.0
12 M 26.0 25.0 29.0 31.0
13 M 21.5 22.5 23.0 26.5
14 M 23.0 22.5 24.0 27.5
15 M 25.5 27.5 26.5 27.0
16 M 20.0 23.5 22.5 26.0
17 M 24.5 25.5 27.0 28.5
18 M 22.0 22.0 24.5 26.5
19 M 24.0 21.5 24.5 25.5
20 M 23.0 20.5 31.0 26.0
21 M 27.5 28.0 31.0 31.5
22 M 23.0 23.0 23.5 25.0
23 M 21.5 23.5 24.0 28.0
24 M 17.0 24.5 26.0 29.5
25 M 22.5 25.5 25.5 26.0
26 M 23.0 24.5 26.0 30.0
27 M 22.0 21.5 23.5 25.0
‘),col.names=c(‘Person’,’Gender’,’Age8′,’Age10′,’Age12′,’Age14′),
colClasses=c(‘factor’,’factor’,rep(‘numeric’,4)))
rm <- reshape(r1,direction=’long’,
varying=list(c(‘Age8′,’Age10′,’Age12′,’Age14’)),
timevar=’Age’,idvar=c(‘Person’,’Gender’),
v.names=’y’,
times=c(8,10,12,14))
rm$Gender <- relevel(rm$Gender,ref=’M’)
rm$fage=factor(rm$Age)
nlme
Ignoring correlation, the analysis is simple. However, this is not the analysis which is presented in the STAT Users Guide.
lNull <- lme(y ~ Age * Gender,data=rm, random= ~ 1 | Person,
method=’REML’)
summary(lNull)
Linear mixed-effects model fit by REML
Data: rm
AIC BIC logLik
445.7572 461.6236 -216.8786
Random effects:
Formula: ~1 | Person
(Intercept) Residual
StdDev: 1.816214 1.386382
Fixed effects: y ~ Age * Gender
Value Std.Error DF t-value p-value
(Intercept) 16.340625 0.9813122 79 16.651810 0.0000
Age 0.784375 0.0775011 79 10.120823 0.0000
GenderF 1.032102 1.5374208 25 0.671321 0.5082
Age:GenderF -0.304830 0.1214209 79 -2.510520 0.0141
Correlation:
(Intr) Age GendrF
Age -0.869
GenderF -0.638 0.555
Age:GenderF 0.555 -0.638 -0.869
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-3.59804400 -0.45461690 0.01578365 0.50244658 3.68620792
Number of Observations: 108
Number of Groups: 27
First Model
The first model in the guide should be general symmetric in R structure. While I first modeled this in the correlation term (see below), I ended up building this in the random term. The current model has fixed effects exactly like PROC MIXED, associated test very close, but the R matrix is twice as large.
lSymm <- lme(y ~ Age * Gender,
data=rm, random= list(Person =pdSymm(~ fage-1)),method=’ML’)
summary(lSymm)
Linear mixed-effects model fit by maximum likelihood
Data: rm
AIC BIC logLik
449.477 489.709 -209.7385
Random effects:
Formula: ~fage – 1 | Person
Structure: General positive-definite
StdDev Corr
fage8 2.1650807 fage8 fage10 fage12
fage10 1.8698474 0.603
fage12 2.3554563 0.708 0.617
fage14 2.0460614 0.569 0.800 0.793
Residual 0.6569738
Fixed effects: y ~ Age * Gender
Value Std.Error DF t-value p-value
(Intercept) 15.842297 0.9534264 79 16.616171 0.0000
Age 0.826803 0.0806212 79 10.255400 0.0000
GenderF 1.583072 1.4937321 25 1.059810 0.2994
Age:GenderF -0.350438 0.1263091 79 -2.774446 0.0069
Correlation:
(Intr) Age GendrF
Age -0.875
GenderF -0.638 0.559
Age:GenderF 0.559 -0.638 -0.875
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-1.34109378 -0.30114043 0.03182357 0.26360983 1.69966216
Number of Observations: 108
Number of Groups: 27
as.matrix(lSymm$modelStruct$reStruct)
$Person
fage8 fage10 fage12 fage14
fage8 10.860557 5.655273 8.365118 5.843739
fage10 5.655273 8.100582 6.296156 7.095082
fage12 8.365118 6.296156 12.854465 8.858492
fage14 5.843739 7.095082 8.858492 9.699319
Second model
The second model is compound symmetry. Again, the statements are easy, but the R values twice as large. Fixed values match the PROC MIXED example output.
lCSymm <- lme(y ~ Age * Gender,
data=rm, random= list(Person =pdCompSymm(~ fage-1)),method=’ML’)
summary(lCSymm)
Linear mixed-effects model fit by maximum likelihood
Data: rm
AIC BIC logLik
442.6391 461.414 -214.3195
Random effects:
Formula: ~fage – 1 | Person
Structure: Compound Symmetry
StdDev Corr
fage8 2.1024306
fage10 2.1024306 0.686
fage12 2.1024306 0.686 0.686
fage14 2.1024306 0.686 0.686 0.686
Residual 0.6963793
Fixed effects: y ~ Age * Gender
Value Std.Error DF t-value p-value
(Intercept) 16.340625 0.9814310 79 16.649795 0.0000
Age 0.784375 0.0779963 79 10.056564 0.0000
GenderF 1.032102 1.5376069 25 0.671239 0.5082
Age:GenderF -0.304830 0.1221968 79 -2.494580 0.0147
Correlation:
(Intr) Age GendrF
Age -0.874
GenderF -0.638 0.558
Age:GenderF 0.558 -0.638 -0.874
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-1.854861527 -0.235701022 0.007918635 0.265357548 1.898850367
Number of Observations: 108
Number of Groups: 27
as.matrix(lCSymm$modelStruct$reStruct)
$Person
fage8 fage10 fage12 fage14
fage8 9.114895 6.249301 6.249301 6.249301
fage10 6.249301 9.114895 6.249301 6.249301
fage12 6.249301 6.249301 9.114895 6.249301
fage14 6.249301 6.249301 6.249301 9.114895
MCMCglmm
I ran into the limits of MCMCglmm pretty quickly. However, the first model was possible after defining a prior. (As a note, a slightly more verbose output while setting the prior would have made life a bit more easy). The R structure is quite far from PROC MIXED.
library(MCMCglmm)
rm$fage=factor(rm$Age)
prior1 <- list(R=list(V=diag(4),nu=.01),
G=list(G1=list(V=diag(1),nu=.01)
m1 <- MCMCglmm(y ~ Age* Gender ,
random= ~ Person ,
rcov=~ us(fage) :Person,
data=rm,family=’gaussian’,
nitt=500000,thin=20,burnin=50000,
verbose=FALSE
, prior=prior1
)
summary(m1)
Iterations = 50001:499981
Thinning interval = 20
Sample size = 22500
DIC: 123.8128
G-structure: ~Person
post.mean l-95% CI u-95% CI eff.samp
Person 4.209 2.136 6.963 5615
R-structure: ~us(fage):Person
post.mean l-95% CI u-95% CI eff.samp
8:8.Person 2.6261 0.45693 5.5948 562.4
10:8.Person -0.5219 -1.89671 1.0416 659.6
12:8.Person 0.3808 -1.55471 2.6806 517.4
14:8.Person -0.7821 -2.32815 0.8618 862.4
8:10.Person -0.5219 -1.89671 1.0416 659.6
10:10.Person 1.4867 0.06493 3.4580 493.6
12:10.Person -0.4788 -1.89693 1.1443 756.1
14:10.Person 0.1536 -1.06686 1.7433 515.8
8:12.Person 0.3808 -1.55471 2.6806 517.4
10:12.Person -0.4788 -1.89693 1.1443 756.1
12:12.Person 3.0773 0.55387 6.2826 546.7
14:12.Person 0.6970 -1.03654 2.9108 505.7
8:14.Person -0.7821 -2.32815 0.8618 862.4
10:14.Person 0.1536 -1.06686 1.7433 515.8
12:14.Person 0.6970 -1.03654 2.9108 505.7
14:14.Person 1.9824 0.21537 4.3931 529.9
Location effects: y ~ Age * Gender
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) 15.85825 13.71176 18.11169 18083 <4e-05 ***
Age 0.82697 0.64584 1.00791 22500 <4e-05 ***
GenderF 1.55014 -1.84468 4.93357 22500 0.3507
Age:GenderF -0.35005 -0.62738 -0.06956 22500 0.0163 *
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
lme comparison
It is interesting to notice the R correlation is close to slightly different formulation of lme, where not the random structure, but rather the correlation is set. The fixed effects seem the same between model formulations.lSymm <- lme(y ~ Age * Gender,corr=corSymm(form= ~ I(Age/2-3)),
data=rm, random= ~ 1 | Person,method=’ML’)
summary(lSymm)
Linear mixed-effects model fit by maximum likelihood
Data: rm
AIC BIC logLik
445.2348 477.4204 -210.6174
Random effects:
Formula: ~1 | Person
(Intercept) Residual
StdDev: 1.753195 1.356352
Correlation Structure: General
Formula: ~I(Age/2 – 3) | Person
Parameter estimate(s):
Correlation:
1 2 3
2 -0.203
3 0.006 -0.192
4 -0.313 0.322 0.214
Fixed effects: y ~ Age * Gender
Value Std.Error DF t-value p-value
(Intercept) 15.920019 0.9766449 79 16.300724 0.0000
Age 0.824758 0.0810365 79 10.177603 0.0000
GenderF 1.488148 1.5301085 25 0.972577 0.3401
Age:GenderF -0.348628 0.1269599 79 -2.745971 0.0075
Correlation:
(Intr) Age GendrF
Age -0.874
GenderF -0.638 0.558
Age:GenderF 0.558 -0.638 -0.874
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-3.2236086696 -0.5564845586 0.0009453569 0.5124328827 3.7675873271
Number of Observations: 108
Number of Groups: 27
as.matrix(lSymm$modelStruct$corStruct)[[2]].
[,1] [,2] [,3] [,4]
[1,] 1.000000000 -0.2027774 0.005515599 -0.3128891
[2,] -0.202777439 1.0000000 -0.192400735 0.3222584
[3,] 0.005515599 -0.1924007 1.000000000 0.2142248
[4,] -0.312889054 0.3222584 0.214224798 1.0000000
cov2cor(matrix(colMeans(m1$VCV[,-1]),nrow=4))
[2,] -0.202777439 1.0000000 -0.192400735 0.3222584
[3,] 0.005515599 -0.1924007 1.000000000 0.2142248
[4,] -0.312889054 0.3222584 0.214224798 1.0000000
cov2cor(matrix(colMeans(m1$VCV[,-1]),nrow=4))
[,1] [,2] [,3] [,4]
[1,] 1.0000000 -0.26413228 0.1339446 -0.34279042
[2,] -0.2641323 1.00000000 -0.2238354 0.08945827
[3,] 0.1339446 -0.22383536 1.0000000 0.28219796
[4,] -0.3427904 0.08945827 0.2821980 1.00000000
To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.