lmer vs Stan for a somewhat involved dataset.
[This article was first published on Shravan Vasishth's Slog (Statistics blog), and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Here is a comparison of lmer vs Stan output on a mildly complicated dataset from a psychology expt. (Kliegl et al 2011). The data are here: https://www.dropbox.com/s/pwuz1g7rtwy17p1/KWDYZ_test.rda.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The data and paper available from: http://openscience.uni-leipzig.de/index.php/mr2
I should say that datasets from psychology and psycholinguistic can be much more complicated than this. So this was only a modest test of Stan.
The basic result is that I was able to recover in Stan the parameter estimates (fixed effects) that were primarily of interest, compared to the lmer output. The sds of the variance components all come out pretty much the same in Stan vs lmer. The correlations estimated in Stan are much smaller than lmer, but this is normal: the bayesian models seem to be more conservative when it comes to estimating correlations between random effects.
Traceplots are here: https://www.dropbox.com/s/91xhk7ywpvh9q24/traceplotkliegl2011.pdf
They look generally fine to me.
One very important fact about lmer vs Stan is that lmer took 23 seconds to return an answer, but Stan took 18,814 seconds (about 5 hours), running 500 iterations and 2 chains.
One caveat is that I do have to try to figure out how to speed up Stan so that we get the best performance out of it that is possible.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Details: | |
key: | |
beta[1] = Intercept | |
beta[2] = c1 | |
beta[3] = c2 | |
beta[4] = c3 | |
1. Fixed effects, Stan vs lmer: | |
Stan: | |
Predictors: | |
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat | |
beta[1] 381.1 3.5 7.0 367.5 376.2 381.1 385.9 394.1 4 1.4 | |
beta[2] 31.6 0.8 3.5 23.9 29.7 32.0 34.1 37.9 17 1.2 | |
beta[3] 14.0 0.1 2.2 9.9 12.5 13.8 15.4 18.3 292 1.0 | |
beta[4] 2.9 0.1 2.0 -1.3 1.5 3.0 4.2 7.0 251 1.0 | |
lmer: | |
Estimate Std. Error t value | |
beta[1] 389.73 7.15 54.5 | |
beta[2] 33.78 3.31 10.2 | |
beta[3] 13.98 2.32 6.0 | |
beta[4] 2.75 2.22 1.2 | |
2. Random effects, Stan vs lmer: | |
Stan: Var sd | |
# id (Intercept) 3152.9 56.151 | |
# c1 531.6 23.056 0.5054 | |
# c2 89.3 9.4499 -0.01583 0.11337 | |
# c3 77.5 8.8034 -0.094069 -0.4715 0.028849 | |
# Residual 4886 69.9 | |
lmer: Var sd | |
# id (Intercept) 3098.3 55.66 | |
# c1 550.8 23.47 0.603 | |
# c2 121.0 11.00 -0.129 -0.014 | |
# c3 93.2 9.65 -0.247 -0.846 0.376 | |
# Residual 4877.1 69.84 | |
## calculations of correlations from Stan output: | |
r12: 654.3 / (56.151 * 23.056) = 0.5054 = r12 | |
r13: -8.4 / (56.151 * 9.4499) = -0.01583 = r13 | |
r14: -46.5 / (56.151 * 8.8034) = -0.094069 = r14 | |
r23: 24.7 / (23.056 * 9.4499) = 0.11337 = r23 | |
r24: -95.7 / (23.056 * 8.8034) = -0.4715 = r24 | |
r34: 2.4 / (9.4499 * 8.8034) = 0.028849 = r34 | |
## original stan output: | |
Sigma[1,1] 3152.9 32.4 589.5 2197.9 2729.1 3090.3 3522.1 4432.8 331 1.0 | |
Sigma[1,2] 654.3 8.5 180.7 367.8 522.5 628.3 767.5 1022.8 455 1.0 | |
Sigma[1,3] -8.4 6.4 91.8 -203.6 -58.8 -4.5 49.7 161.7 207 1.0 | |
Sigma[1,4] -46.5 7.4 91.1 -235.8 -101.9 -41.4 11.0 118.9 151 1.0 | |
Sigma[2,1] 654.3 8.5 180.7 367.8 522.5 628.3 767.5 1022.8 455 1.0 | |
Sigma[2,2] 531.6 6.4 109.0 357.4 455.4 517.5 599.2 780.8 289 1.0 | |
Sigma[2,3] 24.7 2.9 37.3 -50.4 0.6 26.0 49.8 94.7 162 1.0 | |
Sigma[2,4] -95.7 4.2 43.6 -186.6 -121.0 -91.4 -68.0 -16.6 109 1.0 | |
Sigma[3,1] -8.4 6.4 91.8 -203.6 -58.8 -4.5 49.7 161.7 207 1.0 | |
Sigma[3,2] 24.7 2.9 37.3 -50.4 0.6 26.0 49.8 94.7 162 1.0 | |
Sigma[3,3] 89.3 11.6 53.1 11.5 50.0 85.7 120.6 207.4 21 1.1 | |
Sigma[3,4] 2.4 2.8 21.2 -31.6 -11.0 -0.6 13.4 49.1 56 1.0 | |
Sigma[4,1] -46.5 7.4 91.1 -235.8 -101.9 -41.4 11.0 118.9 151 1.0 | |
Sigma[4,2] -95.7 4.2 43.6 -186.6 -121.0 -91.4 -68.0 -16.6 109 1.0 | |
Sigma[4,3] 2.4 2.8 21.2 -31.6 -11.0 -0.6 13.4 49.1 56 1.0 | |
Sigma[4,4] 77.5 9.7 44.5 11.2 45.5 68.6 102.4 175.3 21 1.1 | |
#LMER fit: time taken: 23 seconds | |
# Groups Name Variance Std.Dev. Corr | |
# id (Intercept) 3098.3 55.66 | |
# c1 550.8 23.47 0.603 | |
# c2 121.0 11.00 -0.129 -0.014 | |
# c3 93.2 9.65 -0.247 -0.846 0.376 | |
# Residual 4877.1 69.84 | |
#Number of obs: 28710, groups: id, 61 | |
# | |
#Fixed effects: | |
# Estimate Std. Error t value | |
#(Intercept) 389.73 7.15 54.5 | |
#c1 33.78 3.31 10.2 | |
#c2 13.98 2.32 6.0 | |
#c3 2.75 2.22 1.2 | |
##Stan: time taken: 18,814 seconds (5.2261 hours) | |
> fit2 <- stan(model_code = klieglvaryingintslopes_codefast, | |
+ data = dat2, | |
+ iter = 500, chains = 2) | |
TRANSLATING MODEL 'klieglvaryingintslopes_codefast' FROM Stan CODE TO C++ CODE NOW. | |
COMPILING THE C++ CODE FOR MODEL 'klieglvaryingintslopes_codefast' NOW. | |
SAMPLING FOR MODEL 'klieglvaryingintslopes_codefast' NOW (CHAIN 1). | |
Iteration: 500 / 500 [100%] (Sampling) | |
Elapsed Time: 8255.26 seconds (Warm-up) | |
1650.07 seconds (Sampling) | |
9905.33 seconds (Total) | |
SAMPLING FOR MODEL 'klieglvaryingintslopes_codefast' NOW (CHAIN 2). | |
Iteration: 500 / 500 [100%] (Sampling) | |
Elapsed Time: 8062.87 seconds (Warm-up) | |
845.716 seconds (Sampling) | |
8908.59 seconds (Total) | |
> print(fit2) | |
Inference for Stan model: klieglvaryingintslopes_codefast. | |
2 chains, each with iter=500; warmup=250; thin=1; | |
post-warmup draws per chain=250, total post-warmup draws=500. | |
The model code: | |
klieglvaryingintslopes_codefast <- 'data { | |
int<lower=1> N; | |
real rt[N]; //outcome | |
real c1[N]; //predictor | |
real c2[N]; //predictor | |
real c3[N]; //predictor | |
int<lower=1> I; //number of subjects | |
int<lower=1, upper=I> id[N]; //subject id | |
vector[4] mu_prior; //vector of zeros passed in from R | |
} | |
parameters { | |
vector[4] beta; // intercept and slope | |
vector[4] u[I]; // random intercept and slopes | |
real<lower=0> sigma_e; // residual sd | |
vector<lower=0>[4] sigma_u; // subj sd | |
corr_matrix[4] Omega; // correlation matrix for random intercepts and slopes | |
} | |
transformed parameters { | |
matrix[4,4] D; | |
D <- diag_matrix(sigma_u); | |
} | |
model { | |
matrix[4,4] L; | |
matrix[4,4] DL; | |
real mu[N]; // mu for likelihood | |
//priors: | |
beta ~ normal(0,50); | |
sigma_e ~ cauchy(0,2); | |
sigma_u ~ cauchy(0,2); | |
Omega ~ lkj_corr(4.0); | |
L <- cholesky_decompose(Omega); | |
DL <- D * L; | |
for (i in 1:I) // loop for subj random effects | |
u[i] ~ multi_normal_cholesky(mu_prior, DL); | |
for (n in 1:N) { | |
mu[n] <- beta[1] + beta[2]*c1[n] + beta[3]*c2[n] + beta[4]*c3[n] + u[id[n], 1] + u[id[n], 2]*c1[n] + u[id[n], 3]*c2[n] + u[id[n], 4]*c3[n]; | |
} | |
rt ~ normal(mu,sigma_e); // likelihood | |
} | |
generated quantities { | |
cov_matrix[4] Sigma; | |
Sigma <- D * Omega * D; | |
} | |
' | |
To leave a comment for the author, please follow the link and comment on their blog: Shravan Vasishth's Slog (Statistics blog).
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.