Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I often get asked how to fit different multilevel models (or individual growth models, hierarchical linear models or linear mixed-models, etc.) in R. In this guide I have compiled some of the more common and/or useful models (at least common in clinical psychology), and how to fit them using nlme::lme()
and lme4::lmer()
. I will cover the common two-level random intercept-slope model, and three-level models when subjects are clustered due to some higher level grouping (such as therapists), partially nested models were there are clustering in one group but not the other, and different level 1 residual covariances (such as AR(1)). The point of this post is to show how to fit these longitudinal models in R, not to cover the statistical theory behind them, or how to interpret them.
Data format
In all examples I assume this data structure.
subjects | tx | therapist | time | y |
---|---|---|---|---|
1 | 0 | 1 | 0 | 10 |
1 | 0 | 1 | 1 | 12 |
1 | 0 | 1 | 2 | 14 |
2 | 0 | 1 | 0 | 4 |
2 | 0 | 1 | 1 | 14 |
2 | 0 | 1 | 2 | 13 |
3 | 0 | 2 | 0 | 12 |
3 | 0 | 2 | 1 | 15 |
3 | 0 | 2 | 2 | 16 |
4 | 0 | 2 | 0 | 17 |
4 | 0 | 2 | 1 | 13 |
4 | 0 | 2 | 2 | 12 |
5 | 0 | 3 | 0 | 15 |
5 | 0 | 3 | 1 | 13 |
… | … | … | … | … |
Where subjects
is each subject’s id, tx
represent treatment allocation and is coded 0 or 1, therapist
is the refers to either clustering due to therapists, or for instance a participant’s group in group therapies. Y
is the outcome variable.
Longitudinal two-level model
We will begin with the two-level model, where we have repeated measures on individuals in different treatment groups.
Unconditional model
Model formulation $$ begin{align} text{Level 1}& notag\ Y_{ij} &= beta_{0j} + R_{ij}\ text{Level 2}& notag\ beta_{0j} &= gamma_{00} + U_{0j} \ end{align} $$ with, $$ begin{equation} U_{0j} simmathcal{N}(0, ~tau_{00}^2) , end{equation} $$ and $$ begin{equation} R_{ij} simmathcal{N}(0, ~sigma^2) end{equation} $$
To fit this model we run
1 2 3 4 5 | # lme4 lmer(y ~ 1 + (1 | subjects), data=data) # nlme lme(y ~ 1, random = ~ 1 | subjects, data=data) |
Unconditional growth model
Model formulation $$ begin{align} text{Level 1}& notag\ Y_{ij} &= beta_{0j} + beta_{1j}t_{ij} + R_{ij}\ text{Level 2}& notag\ beta_{0j} &= gamma_{00} + U_{0j} \ beta_{1j} &= gamma_{10} + U_{1j} \ end{align} $$ with, $$ begin{equation} begin{pmatrix} U_{0j} \ U_{1j} end{pmatrix} simmathcal{N} left( begin{matrix} 0 &\ 0 end{matrix} , begin{matrix} tau_{00}^2 & tau_{01}\ tau_{01} & tau_{10}^2 end{matrix} right) , end{equation} $$ and $$ begin{equation} R_{ij} simmathcal{N}(0, ~sigma^2) end{equation} $$
To fit this model we run
1 2 3 4 5 | # lme4 lmer(y ~ time + (time | subjects), data=data) # nlme lme(y ~ time, random = ~ time | subjects, data=data) |
Conditional growth model
Model formulation $$ begin{align} text{Level 1}& notag\ Y_{ij} &= beta_{0j} + beta_{1j}t_{ij} + R_{ij}\ text{Level 2}& notag\ beta_{0j} &= gamma_{00} + gamma_{01} TX_j + U_{0j} \ beta_{1j} &= gamma_{10} + gamma_{11} TX_j + U_{1j} \ end{align} $$ with, $$ begin{equation} begin{pmatrix} U_{0j} \ U_{1j} end{pmatrix} simmathcal{N} left( begin{matrix} 0 &\ 0 end{matrix} , begin{matrix} tau_{00}^2 & tau_{01}\ tau_{01} & tau_{10}^2 end{matrix} right) , end{equation} $$ and $$ begin{equation} R_{ij} simmathcal{N}(0, ~sigma^2) end{equation} $$
To fit this model we run
1 2 3 4 5 | # lme4 lmer(y ~ time * tx + (time | subjects), data=data) # nlme lme(y ~ time * tx, random = ~ time | subjects, data=data) |
Conditional growth model: dropping random slope
Model formulation $$ begin{align} text{Level 1}& notag\ Y_{ij} &= beta_{0j} + beta_{1j}t_{ij} + R_{ij}\ text{Level 2}& notag\ beta_{0j} &= gamma_{00} + gamma_{01} TX_j + U_{0j} \ beta_{1j} &= gamma_{10} + gamma_{11} TX_j \ end{align} $$ with, $$ begin{equation} U_{0j} simmathcal{N}(0, ~tau_{00}^2) end{equation} $$ and $$ begin{equation} R_{ij} simmathcal{N}(0, ~sigma^2) end{equation} $$
To fit this model we run
1 2 3 4 5 | # lme4 lmer(y ~ time * tx + (1 | subjects), data=data) # nlme lme(y ~ time * tx, random = ~ 1 | subjects, data=data) |
Conditional growth model: dropping random intercept
Model formulation $$ begin{align} text{Level 1}& notag\ Y_{ij} &= beta_{0j} + beta_{1j}t + R_{ij}\ text{Level 2}& notag\ beta_{0j} &= gamma_{00} + gamma_{01} TX_j \ beta_{1j} &= gamma_{10} + gamma_{11} TX_j + U_{1j}\ end{align} $$ with, $$ begin{equation} U_{0j} simmathcal{N}(0, ~tau_{10}^2) end{equation} $$ and $$ begin{equation} R_{ij} simmathcal{N}(0, ~sigma^2) end{equation} $$
To fit this model we run
1 2 3 4 5 | # lme4 lmer(y ~ time * tx + ( 0 + time | subjects), data=data) # nlme lme(y ~ time * tx, random = ~ 0 + time | subjects, data=data) |
Conditional growth model: dropping intercept-slope covariance
Model formulation $$ begin{align} text{Level 1}& notag\ Y_{ij} &= beta_{0j} + beta_{1j}t + R_{ij}\ text{Level 2}& notag\ beta_{0j} &= gamma_{00} + gamma_{01} TX_j + U_{0j} \ beta_{1j} &= gamma_{10} + gamma_{11} TX_j + U_{1j} \ end{align} $$ with, $$ begin{equation} begin{pmatrix} U_{0j} \ U_{1j} end{pmatrix} simmathcal{N} left( begin{matrix} 0 &\ 0 end{matrix} , begin{matrix} tau_{00}^2 & 0\ 0 & tau_{10}^2 end{matrix} right) , end{equation} $$ and $$ begin{equation} R_{ij} simmathcal{N}(0, ~sigma^2) end{equation} $$ To fit this model we run
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # lme4 lmer(y ~ time * tx + (time || subjects) , data=data) # same as above lmer(y ~ time * tx + (1 | subjects) + (0 + time | subjects) , data=data) # nlme lme(y ~ time * tx, random = list(subjects = pdDiag(~time)), data=data) |
Three-level models
Here I will cover some different three-level models. In my examples clustering at the highest level is due to therapists. But the examples generalize to other forms of clustering as well, such as group therapy or clustering due to health-care provider.
Conditional three-level growth model
We will jump straight to the conditional three-level growth model, with the following model formulation: $$ begin{align} text{Level 1}& notag\ Y_{ijk} &= beta_{0jk} + beta_{1jk}t_{ijk} + R_{ijk}\ text{Level 2}& notag\ beta_{0jk} &= gamma_{00k} + U_{0jk} \ beta_{1jk} &= gamma_{10k} + U_{1jk} \ text{Level 3}& notag\ gamma_{00k} &= delta_{000} + delta_{001} TX_k + V_{0k} \ gamma_{10k} &= delta_{100} + delta_{101} TX_k + V_{1k} \ end{align} $$ with, $$ begin{equation} begin{pmatrix} U_{0j} \ U_{1j} end{pmatrix} simmathcal{N} left( begin{matrix} 0 &\ 0 end{matrix} , begin{matrix} tau_{00}^2 & tau_{01}\ tau_{01} & tau_{10}^2 end{matrix} right) , end{equation} $$ and, $$ begin{equation} begin{pmatrix} V_{0k} \ V_{1k} \ end{pmatrix} simmathcal{N} left( begin{matrix} 0 &\ 0 end{matrix} , begin{matrix} varphi_{00}^2 & varphi_{01} \ varphi_{01} & varphi_{10}^2 end{matrix} right) , end{equation} $$ and $$ begin{equation} R_{ijk} simmathcal{N}(0, ~sigma^2) end{equation} $$
To fit this model we use therapist/subjects
, which specifies nesting. This formula expands to a main effect of therapist and a interaction between therapist and subjects (which is the subject level effect).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | # lme lmer(y ~ time * tx + (time | therapist/subjects), data=df) ## expands to lmer(y ~ time * tx + (time | therapist:subjects) + (time | therapist), data=df) # nlme lme(y ~ time * tx, random = ~time | therapist/subjects, data=df) ## expands to lme(y ~ time * tx, random = list(therapist = ~time, subjects = ~time), data=df) |
Subject level randomization (therapist crossed effect)
In the previous example therapists only provided one type of treatment (nested design). Sometimes therapists will be a crossed effect, i.e. in a parallel group design they will deliver both treatments. If it’s a randomized trial then in this design we have subject level randomization, whereas in the previous example randomization was at the therapist level.
$$ begin{align} text{Level 1}& notag\ Y_{ijk} &= beta_{0jk} + beta_{1jk}t_{ijk} + R_{ijk}\ text{Level 2}& notag\ beta_{0jk} &= gamma_{00k} + gamma_{01k} TX_{jk} + U_{0jk} \ beta_{1jk} &= gamma_{10k} + gamma_{11k} TX_{jk} + U_{1jk} \ text{Level 3}& notag\ gamma_{00k} &= delta_{000} + V_{0k} \ gamma_{10k} &= delta_{100} + V_{1k} \ gamma_{01k} &= delta_{010} + V_{2k} \ gamma_{11k} &= delta_{110} + V_{3k} \ end{align} $$ with, $$ begin{equation} begin{pmatrix} U_{0j} \ U_{1j} end{pmatrix} simmathcal{N} left( begin{matrix} 0 &\ 0 end{matrix} , begin{matrix} tau_{00}^2 & tau_{01}\ tau_{01} & tau_{10}^2 end{matrix} right) , end{equation} $$ and, $$ begin{equation} begin{pmatrix} V_{0k} \ V_{1k} \ V_{2k} \ V_{3k} end{pmatrix} simmathcal{N} left( begin{matrix} 0 &\ 0 &\ 0 &\ 0 end{matrix} , begin{matrix} varphi_{00}^2 & 0 & 0 & 0 \ 0 & varphi_{10}^2 & 0 & 0 \ 0 & 0 & varphi_{20}^2 & 0 \ 0 & 0 & 0 & varphi_{30}^2 end{matrix} right) , end{equation} $$
and $$ begin{equation} R_{ijk} simmathcal{N}(0, ~sigma^2) end{equation} $$
In this model we estimate no covariances at level 3. However, at the therapist level we have random effects for time, treatment and time * treatment. I fit this saturated model because you can easily delete a random effect in the expanded lmer
syntax below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | # lme4 lmer(y ~ time * tx + (time | therapist:subjects) + (time * tx || therapist), data=df) ## expands to lmer(y ~ time*tx + (time | subjects:therapist) + (1 | therapist) + (0 + tx | therapist) + (0 + time | therapist) + (0 + time:tx | therapist), data=df) # nlme lme(y ~ time * tx, random = list(therapist = pdDiag(~time * tx), subjects = ~time), data=df) |
Different level 3 variance-covariance matrix
We might hypothesize that therapists that are allocated participants that report worse symptoms at treatment start have better outcomes (more room for improvement). To allow for separate covariances in each treatment group we update the variance-covariance matrix at level 3 $$ begin{pmatrix} V_{0k} \ V_{1k} \ V_{2k} \ V_{3k} end{pmatrix} simmathcal{N} left( begin{matrix} 0 &\ 0 &\ 0 &\ 0 end{matrix} , begin{matrix} varphi_{00}^2 & varphi_{01} & 0 & 0 \ varphi_{01} & varphi_{10}^2 & 0 & 0 \ 0 & 0 & varphi_{20}^2 & varphi_{23} \ 0 & 0 & varphi_{23} & varphi_{30}^2 end{matrix} right) $$
To fit this model we run
1 2 3 4 5 6 7 8 9 10 11 | # lme4 lmer(y ~ time * tx + (time | therapist:subjects) + (time | therapist) + (0 + tx + time:tx | therapist), data=data) # nlme lme(y ~time * tx, random = list(therapist = pdBlocked(list(~time, ~0 + tx + time:tx)), subjects = ~time ), data=data) |
Of course, we could also estimate all six covariances at level 3. For instance, we could look at if therapists who are more successful with Treatment A are also more successful with Treatment B, i.e. $cov(V_{2k}, V_{3k}) = varphi_{23}$, and so forth. The full unstructured level 3 variance-covariance matrix we will estimate is thus
$$ begin{pmatrix} V_{0k} \ V_{1k} \ V_{2k} \ V_{3k} end{pmatrix} simmathcal{N} left( begin{matrix} 0 &\ 0 &\ 0 &\ 0 end{matrix} , begin{matrix} varphi_{00}^2 & varphi_{01} & varphi_{02} & varphi_{03} \ varphi_{01} & varphi_{10}^2 & varphi_{12} & varphi_{13} \ varphi_{02} & varphi_{12} & varphi_{20}^2 & varphi_{23} \ varphi_{03} & varphi_{13} & varphi_{23} & varphi_{30}^2 end{matrix} right) $$
Which we fit by running
1 2 3 4 5 6 7 8 9 10 | # lme4 lmer(y ~ time * tx + (time | therapist:subjects) + (time * tx | therapist), data=df) # nlme lme(y ~ time * tx, random = list(therapist = ~time * tx, subjects = ~time), data=df) |
Partially nested models
Partially nesting occurs when we have nesting in one group but not the other. For instance, we might compare a treatment group to a wait-list condition. Subjects in the wait-list will not be nested, but subjects in treatment group will be nested within therapists.
We can write this model like this $$ begin{align} text{Level 1}& notag\ Y_{ijk} &= beta_{0jk} + beta_{1jk}t_{ijk} + R_{ijk}\ text{Level 2}& notag\ beta_{0jk} &= gamma_{00} + gamma_{01k} TX_{jk} + U_{0jk} \ beta_{1jk} &= gamma_{10} + gamma_{11k} TX_{jk} + U_{1jk} \ text{Level 3}& notag\ gamma_{01k} &= delta_{010} + V_{0k} \ gamma_{11k} &= delta_{110} + V_{1k} \ end{align} $$ with, $$ begin{equation} begin{pmatrix} U_{0j} \ U_{1j} end{pmatrix} simmathcal{N} left( begin{matrix} 0 &\ 0 end{matrix} , begin{matrix} tau_{00}^2 & tau_{01}\ tau_{01} & tau_{10}^2 end{matrix} right) , end{equation} $$ and, $$ begin{equation} begin{pmatrix} V_{0k} \ V_{1k} \ end{pmatrix} simmathcal{N} left( begin{matrix} 0 &\ 0 end{matrix} , begin{matrix} varphi_{00}^2 & 0 \ 0 & varphi_{10}^2 end{matrix} right) , end{equation} $$
and $$ begin{equation} R_{ijk} simmathcal{N}(0, ~sigma^2) end{equation} $$
1 2 3 4 5 6 7 8 9 10 11 12 13 | # lme4 lmer(y ~ time * tx + (1 | group:subjects) + (0 + time | therapist:subjects) + (0 + time:tx | group) + (0 + tx | group), data=data) # nlme lme(y ~ time * tx, random = list(therapist = pdDiag(~ 0 + time:tx + tx), subjects = pdDiag(~time)), data=data) |
More on level 1 specification
Heteroscedasticity at Level 1
Only lme
allows modeling heteroscedastic residual variance at level 1. If we wanted to extend our two level model and allow for different level 1 residual variance in the treatment groups, we’d get
$$ begin{split} (R_{ij} | text{TX} = 0) &simmathcal{N}(0, ~sigma_0^2) \ (R_{ij} | text{TX} = 1) &simmathcal{N}(0, ~sigma_1^2) end{split} $$
If we wanted to extend our two-level model with this level 1 structure we’d run
1 2 3 | lme(y ~ time * tx, random = ~ time | subjects, weights = varIdent(form= ~ 1 | tx), data=data) |
More grouping
We could also add another grouping factor such as time, and fit a model with heteroscedastic level 1 residuals for each time point in each treatment group. For instance, for i = 0, 1, 2 we get $$ (R_{ij} | text{TX} = 0) simmathcal{N} left( begin{matrix} 0 &\ 0 &\ 0 end{matrix} , begin{matrix} sigma_{00}^2 & 0 & 0\ 0 & sigma_{01}^2 & 0 \ 0 & 0 & sigma_{02}^2 end{matrix} right) $$ $$ (R_{ij} | text{TX} = 1) simmathcal{N} left( begin{matrix} 0 &\ 0 &\ 0 end{matrix} , begin{matrix} sigma_{10}^2 & 0 & 0\ 0 & sigma_{11}^2 & 0 \ 0 & 0 & sigma_{12}^2 end{matrix} right) \ $$ which we’d fit by running
1 2 3 4 | lme(y ~ time * tx, random = ~ time | subjects, weights = varIdent(form= ~ 1 | tx * time), data=data) |
First-order Autoregressive AR(1) residuals
For $T = 1, 2, 3, …, N_1$ time points we get the level 1 variance-covariance matrix $$ Sigma = sigma^2 begin{pmatrix} 1 & rho & rho^2 & cdots & rho^{T-1} \ rho & 1 & rho & cdots & rho^{T-2} \ rho^2 & rho & 1 & cdots & rho^{T-3} \ vdots & vdots & vdots & ddots & vdots \ rho^{T-1} & rho^{T-2} & rho^{T-3} & cdots & 1 end{pmatrix} $$ we leads to $$ R_{ij} simmathcal{N} left( 0 , Sigma right) $$
To fit this level 1 residual structure we use the correlation
argument.
1 2 3 4 | lme(y ~ time * tx, random = ~ time | subjects, correlation = corAR1(), data=data) |
Heterogenous AR(1)
We can also extend the level 1 variance-covariance matrix from above, to allow for different residuals at each time point. $$ Sigma = begin{pmatrix} sigma_0^2 & sigma_0 sigma_1 rho & sigma_0 sigma_2 rho^2 & cdots & sigma_0 sigma_i rho^{T-1} \ sigma_1 sigma_0 rho & sigma_1^2 & sigma_1 sigma_2 rho & cdots & sigma_1 sigma_i rho^{T-2} \ sigma_2 sigma_0 rho^2 & sigma_2 sigma_1 rho & sigma_2^2 & cdots & sigma_2 sigma_i rho^{T-3} \ vdots & vdots & vdots & ddots & vdots \ sigma_T sigma_0 rho^{T-1} & sigma_T sigma_1 rho^{T-2} & sigma_T sigma_2 rho^{T-2} & cdots & sigma_T^2 end{pmatrix} $$
and we have that
$$ R_{ij} simmathcal{N} left( 0 , Sigma right) $$
To fit this level 1 model we simply use both the correlation
and the weights
argument.
1 2 3 4 5 | lme(y ~ time * tx, random = ~ time | subjects, weights = varIdent(form= ~ 1 | time), correlation = corAR1(), data=data) |
More level 1 variance-covariances matrices
Se ?corClasses
for the different types of residual variance-covariances matrices lme
can estimate.
Changing the functional form of time
All of the examples above assume linear change. Here I will cover some examples of how to model nonlinear change at level 1. The examples will be based on the two-level model, but you could easily be combined them with the three-level models outlined above.
Quadratic trend
$$ begin{align} text{Level 1}& notag\ Y_{ij} &= beta_{0j} + beta_{1j}t_{1ij} + beta_{2j}t_{1ij}^2 + R_{ij}\ text{Level 2}& notag\ beta_{0j} &= gamma_{00} + gamma_{01} TX_j + U_{0j} \ beta_{1j} &= gamma_{10} + gamma_{11} TX_j + U_{1j} \ beta_{2j} &= gamma_{20} + gamma_{21} TX_j + U_{2j} \ end{align} $$ with, $$ begin{equation} begin{pmatrix} U_{0j} \ U_{1j} \ U_{2j} end{pmatrix} simmathcal{N} left( begin{matrix} 0 &\ 0 &\ 0 end{matrix} , begin{matrix} tau_{00}^2 & tau_{01} & tau_{02} \ tau_{01} & tau_{10}^2 & tau_{12} \ tau_{02} & tau_{12} & tau_{20}^2 end{matrix} right) , end{equation} $$ and $$ begin{equation} R_{ij} simmathcal{N}(0, ~sigma^2) end{equation} $$
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # lme4 lmer(y ~ (time + I(time^2)) * tx + (time + I(time^2) | subjects), data=data) # or use poly() lmer(y ~ poly(time, 2, raw = TRUE) * tx + (poly(time, 2, raw = TRUE) | subjects), data=data) # nlme lme(y ~ (time + I(time^2)) * tx, random = ~time + I(time^2) | subjects, data=data) |
Orthogonal polynomials
If you’d like to fit orthogonal polynomials you can use the poly()
function with raw = FALSE
(which is the default).
1 2 3 4 5 6 7 8 9 | # lme4 lmer(y ~ poly(time, 2) * tx + (poly(time, 2) | subjects), data=data) # nlme lme(y ~ poly(time, 2) * tx, random = ~poly(time, 2) | subjects, data=data) |
Piecewise growth curve
Segmenting the time trend into different pieces has got more to do with simple dummy coding of regression variables, than any specifics of lme
or lmer
. However, I will cover some common scenarios anyway.
To fit a piecewise growth model we simply replace time
with two dummy variables time1
and time2
, that represent the different time periods. A common scenario is that the first piece represents the acute treatment phase, and piece 2 represent the follow-up phase.
Coding scheme 1: separate slopes
Time | 0 | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|---|
Time 1 | 0 | 1 | 2 | 2 | 2 | 2 |
Time 2 | 0 | 0 | 0 | 1 | 2 | 3 |
Coding scheme 2: incremental/decremental slope
Time | 0 | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|---|
Time 1 | 0 | 1 | 2 | 3 | 4 | 5 |
Time 2 | 0 | 0 | 0 | 1 | 2 | 3 |
These two coding schemes only differ in the interpretation of the regression coefficients. In scheme 1 the two slope coefficients represent the actual slope in the respective time period. Whereas in scheme 2 the coefficient for time 2 represents the deviation from the slope in period 1, i.e. if the estimate is 0 then the rate of change is the same in both periods.
We could specify this model like this
$$ begin{align} text{Level 1}& notag\ Y_{ij} &= beta_{0j} + beta_{1j}t_{1ij} + beta_{2j}t_{2ij} + R_{ij}\ text{Level 2}& notag\ beta_{0j} &= gamma_{00} + gamma_{01} TX_j + U_{0j} \ beta_{1j} &= gamma_{10} + gamma_{11} TX_j + U_{1j} \ beta_{2j} &= gamma_{20} + gamma_{21} TX_j + U_{2j} \ end{align} $$ with, $$ begin{equation} begin{pmatrix} U_{0j} \ U_{1j} \ U_{2j} end{pmatrix} simmathcal{N} left( begin{matrix} 0 &\ 0 &\ 0 end{matrix} , begin{matrix} tau_{00}^2 & tau_{01} & tau_{02} \ tau_{01} & tau_{10}^2 & tau_{12} \ tau_{02} & tau_{12} & tau_{20}^2 end{matrix} right) , end{equation} $$ and $$ begin{equation} R_{ij} simmathcal{N}(0, ~sigma^2) end{equation} $$
In this model I’ve fit the full level 2 variance-covariance matrix. If we wanted to fit this model we’d do it like this
1 2 3 4 5 6 7 8 9 10 11 12 | J <- 20 # J participants data$time1 <- rep(0:5, J) data$time2 <- rep(c(0,0,0,1,2,3), J) # lme4 lmer(y ~ (time1 + time2) * tx + (time1 + time2 | subjects), data=data) # nlme lme(y ~ (time1 + time2) * tx, random = ~time1 + time2 | subjects, data=data) |
Drop the correlation between time piece 1 and 2
Sometimes you might want to fit a model with a correlation between the random intercept and time piece 1, but no correlation between time piece 2 and the other effects. This would change the level 2 variance-covariance from above to this
$$ begin{pmatrix} U_{0j} \ U_{1j} \ U_{2j} end{pmatrix} simmathcal{N} left( begin{matrix} 0 &\ 0 &\ 0 end{matrix} , begin{matrix} tau_{00}^2 & tau_{01} & 0 \ tau_{01} & tau_{10}^2 & 0\ 0 & 0 & tau_{20}^2 end{matrix} right) $$
Fitting this model is straight-forward in lmer
and more complicated in lme
.
1 2 3 4 5 6 7 8 9 10 | # lme4 lmer(y ~ (time1 + time2) * tx + (time1 | subjects) + (0 + time2 | subjects), df) # nlme lme(y ~ (time1 + time2) * tx, random = list(subjects = pdBlocked(list(~time1, ~0 + time2))), data=df) |
Adding a quadratic effect
We could extend the two-part piecewise growth model to allow for non-linear change during one or both of the pieces. As an example, I’ll cover extending the model to allow for quadratic change during piece 1.
We could write this model like this $$ begin{align} text{Level 1}& notag\ Y_{ij} &= beta_{0j} + beta_{1j}t_1 + beta_{2j}t_{1ij}^2 + beta_{3j}t_{2ij} + R_{ij}\ text{Level 2}& notag\ beta_{0j} &= gamma_{00} + gamma_{01} TX_j + U_{0j} \ beta_{1j} &= gamma_{10} + gamma_{11} TX_j + U_{1j} \ beta_{2j} &= gamma_{20} + gamma_{21} TX_j + U_{2j} \ beta_{3j} &= gamma_{30} + gamma_{31} TX_j + U_{3j} \ end{align} $$ with, $$ begin{equation} begin{pmatrix} U_{0j} \ U_{1j} \ U_{2j} \ U_{3j} end{pmatrix} simmathcal{N} left( begin{matrix} 0 &\ 0 &\ 0 &\ 0 end{matrix} , begin{matrix} tau_{00}^2 & tau_{01} & tau_{02} & tau_{03} \ tau_{01} & tau_{10}^2 & tau_{12} & tau_{13}\ tau_{02} & tau_{12} & tau_{20}^2 & tau_{23} \ tau_{03} & tau_{13} & tau_{23} & tau_{30}^2 end{matrix} right) , end{equation} $$ and $$ begin{equation} R_{ij} simmathcal{N}(0, ~sigma^2) end{equation} $$
This model could be fit like this
1 2 3 4 5 6 7 8 9 | # lme4 lmer(y ~ (time1 + I(time1^2) + time2) * tx + (time1 + I(time1^2) + time2 | subjects), df) # nlme lme(y ~ (time1 + I(time1^2) + time2) * tx, random = ~time1 + I(time1^2) + time2 | subjects, data=df) |
If you wanted to fit a reduced random effects structure you could use the method outlined in “Drop the correlation between time piece 1 and 2”.
Hypothesis tests
lmer
does not report p-values or degrees of freedoms, see ?pvalues
and r-sig-mixed-models FAQ for why not. However, there are other packages that will calculate p-values for you. I will cover some of them here.
Wald test
1 | summary(lme.mod) |
Likelihood ratio test
1 2 3 4 | fm1 <- lmer(y ~ 1 + (1 | subjects), data=data) fm2 <- lmer(y ~ 1 + (time | subjects), data=data) # also works with lme objects anova(fm1, fm2) |
Profile confidence intervals
1 | confint(lmer.mod) |
Parametric bootstrap
1 | confint(lmer.mod, method="boot", nsim=1000) |
Kenward-Roger degrees of freedom approximation
1 2 | library(lmerTest) anova(lmer.mod, ddf = "Kenward-Roger") |
Shattertwaite degrees of freedom approximation
1 2 3 4 | library(lmerTest) anova(lmer.mod) #or summary(lmer.mod) |
Book recommendations
Not all of these books are specific to R and longitudinal data analysis. However, I’ve found them all useful over the years when working with multilevel/linear mixed models.
- Singer & Wilett (2003). Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence
- Pinheiro & Bates (2009). Mixed-Effects Models in S and S-PLUS (Statistics and Computing)
- Hedeker & Gibsson (2006). Longitudinal Data Analysis
- Fitzmaurice, Laird, & Ware (2011). Applied Longitudinal Analysis
- Diggle et al (2013). Analysis of Longitudinal Data (Oxford Statistical Science Series)
- Gelman & Hill (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models
- Raudenbush & Bryk (2001). Hierarchical Linear Models: Applications and Data Analysis Methods
- Snijders & Bosker (2011). Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling
- Hox, J (2010). Multilevel Analysis: Techniques and Applications, Second Edition
- Goldstein, H (2010). Multilevel Statistical Models
- Mirman (2014). Growth Curve Analysis and Visualization Using R
- Finch, Bolin, & Kelley (2014). Multilevel Modeling Using R
Suggestions, errors or typos
Please don’t hesitate to contact me if you find some errors in this guide.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.