Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
10,000 iterations for 4 chains on the (precompiled) efficiently-parameterized 8-schools model:
> date ()
[1] "Thu Aug 30 22:12:53 2012"
> fit3 <- stan (fit=fit2, data = schools_dat, iter = 1e4, n_chains = 4)
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Iteration: 10000 / 10000 [100%] (Sampling)
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Iteration: 10000 / 10000 [100%] (Sampling)
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Iteration: 10000 / 10000 [100%] (Sampling)
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Iteration: 10000 / 10000 [100%] (Sampling)
> date ()
[1] "Thu Aug 30 22:12:55 2012"
> print (fit3)
Inference for Stan model: anon_model.
4 chains: each with iter=10000; warmup=5000; thin=1; 10000 iterations saved.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
mu 8.0 0.1 5.1 -2.0 4.7 8.0 11.3 18.4 4032 1
tau 6.7 0.1 5.6 0.3 2.5 5.4 9.3 21.2 2958 1
eta[1] 0.4 0.0 0.9 -1.5 -0.2 0.4 1.0 2.2 9691 1
eta[2] 0.0 0.0 0.9 -1.7 -0.6 0.0 0.6 1.7 9565 1
eta[3] -0.2 0.0 0.9 -2.0 -0.8 -0.2 0.4 1.7 10990 1
eta[4] 0.0 0.0 0.9 -1.8 -0.6 0.0 0.5 1.7 10719 1
eta[5] -0.3 0.0 0.9 -2.0 -0.9 -0.4 0.2 1.4 8876 1
eta[6] -0.2 0.0 0.9 -1.9 -0.8 -0.2 0.3 1.6 10369 1
eta[7] 0.3 0.0 0.9 -1.5 -0.2 0.4 0.9 2.1 10303 1
eta[8] 0.1 0.0 0.9 -1.8 -0.6 0.1 0.7 1.9 10428 1
theta[1] 11.6 0.1 8.5 -2.2 6.1 10.5 15.8 32.3 5414 1
theta[2] 8.0 0.1 6.3 -4.5 4.0 7.9 11.9 20.8 9549 1
theta[3] 6.2 0.1 7.9 -12.4 2.1 6.8 11.0 20.7 6547 1
theta[4] 7.7 0.1 6.4 -5.3 3.8 7.8 11.7 20.8 9097 1
theta[5] 5.1 0.1 6.3 -8.8 1.4 5.7 9.5 16.3 8700 1
theta[6] 6.2 0.1 6.8 -8.6 2.2 6.5 10.5 19.0 8518 1
theta[7] 10.8 0.1 6.8 -1.0 6.2 10.2 14.8 26.1 7236 1
theta[8] 8.6 0.1 7.8 -6.2 4.0 8.3 12.9 25.3 7375 1
lp__ -4.8 0.0 2.7 -10.6 -6.4 -4.6 -3.0 -0.3 3580 1
Sample were drawn using NUTS2 at Thu Aug 30 22:12:55 2012.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Check the timings. That took two seconds! And, as you can see from the R-hats and effective sample sizes, 10,000 iterations is overkill here. Damn that’s fast.
But, hey, 8 data points is pretty small. Let’s try the 800-schools problem. I’ll first simulate 800 schools worth of data, rerun, and see what happens.
It took 10 seconds. That’s right, 4 chains of 1000 iterations (enough for convergence) for the 800 schools problem, in 10 seconds.
Not bad, not bad at all.
Well, it’s pretty horrible if you’re planning to do something with a billion data points. But for the sorts of problems where we do full Bayes, yes, that’s just fine.
We’ll do more careful speed comparisons later. For now, let me just point out that the 8 schools is not the ideal model to show the strengths of Stan vs. Bugs. The 8 schools model is conditionally conjugate and so Gibbs can work efficiently there.
P.S. Just for laffs, I tried the (nonconjugate) Student-t model (or, as Stan puts it, student_t) with no added parameterizations, I just replaced normal with student_t with 4 df. The runs took 3 seconds for the 10,000 iterations of the 8 schools and 34 seconds for the 1000 iterations of the 800 schools. But I think the reason it took a bit longer is not the nonconjugacy but just that we haven’t vectorized the student_t model yet. Once it’s vectorized, it should be superfast too. That’s just a small implementation detail, nor requiring any tricks or changes to the algorithm.
P.P.S. These models did take 12 seconds each to compile. But that just needs to be done once per model. Once it’s compiled, you can fit it immediately on new data without needing to recompile.
The post Stan is fast appeared first on Statistical Modeling, Causal Inference, and Social Science.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.