Comparing Stan to JAGS for Bayesian Inference (Part 1?)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Stan is a new, open source, Bayesian inference tool. Stan is based on the the No-U-Turn sampler, a variant of Hamiltonian Monte Carlo. In order to compare Stan with JAGS, a gibbs sampler approach to Bayesian inference, I used the classic WinBUGS rats example. The rats example uses a hierarchical model to look at the growth curves of 30 rats.
In this example, I used R with RStan and R2jags packages. I used the code on the RStan Getting Started and the examples from the JAGS sourceforge page as starting points. I used a very large number of samples to test run times. Data and model files needed are below.
Both RStan and R2jags use the same list structure for the model data. In in R2jags, you need to specify which variables you wish to monitor, while RStan defaults to all of the variables. Both require an external model file and have very similar commands to fit the model. So, within R, both packages are very similar.
When you run the code, you will note that JAGS starts the sampling immediately, while Stan requires a lengthy compilation step. You will also see that I used two different calls to “stan”.
stan.fit <- stan(file = 'rats.stan', data = rats_dat, verbose = FALSE, warmup=1000,iter = 11000, n_chains = 3) stan.fit2 <- stan(fit=stan.fit, data = rats_dat, verbose = FALSE, warmup=1000,iter = 11000, n_chains = 3)
The first compiles the model and runs the sampler, while the second uses the model from the first command. This eliminates the need to recompile the model, saving a good chunk of time. Notice the first argument has changed from "file=" to "fit=". I also discovered that adding the "thin" option to either "stan" call creates a segfault, which I have reported the the Stan maintainers.
The answers that each program produced were identical (to the first decimal, at least). Both gave means, standard deviations, and key quantiles of the posterior distributions of the parameters, as well as R-hat, a measure of convergence you would like close to 1. The plots are slightly different, with RStan color coding the R-hat value vs. reporting it for R2jags. The output from RStan does not include a DIC value, at least by default.
R2jags Output
Inference for Bugs model at "rats.bug", fit using jags, 3 chains, each with 10000 iterations (first 1000 discarded), n.thin = 9 n.sims = 3000 iterations saved mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat alpha.c 242.390 3.712 237.001 240.654 242.431 244.237 247.839 1.129 alpha0 106.316 4.508 99.444 103.915 106.331 108.864 113.246 1.043 beta.c 6.185 0.109 5.968 6.116 6.185 6.258 6.394 1.002 sigma 6.136 1.182 5.266 5.776 6.065 6.395 7.088 1.001 tau.alpha 0.005 0.001 0.003 0.004 0.005 0.006 0.008 1.001 tau.beta 4.137 1.538 1.904 3.073 3.902 4.904 7.857 1.001 tau.c 0.027 0.004 0.020 0.024 0.027 0.030 0.036 1.001 deviance 967.610 23.168 941.121 957.316 966.043 976.319 998.467 1.001 n.eff alpha.c 3000 alpha0 3000 beta.c 1400 sigma 3000 tau.alpha 3000 tau.beta 3000 tau.c 3000 deviance 3000 For each parameter, n.eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor (at convergence, Rhat=1). DIC info (using the rule, pD = var(deviance)/2) pD = 268.5 and DIC = 1236.1 DIC is an estimate of expected predictive error (lower deviance is better).
RStan output
Inference for Stan model: anon_model. 3 chains: each with iter=11000; warmup=1000; thin=1; 11000 iterations saved. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff alpha[1] 239.9 0.0 2.7 234.6 238.1 239.9 241.7 245.1 15672 alpha[2] 247.8 0.0 2.7 242.6 246.0 247.8 249.6 253.1 16865 alpha[3] 252.4 0.0 2.7 247.1 250.6 252.4 254.2 257.7 20956 alpha[4] 232.6 0.0 2.7 227.3 230.8 232.5 234.3 237.8 21050 alpha[5] 231.6 0.0 2.7 226.4 229.8 231.6 233.4 236.8 20250 alpha[6] 249.7 0.0 2.7 244.5 247.9 249.7 251.5 255.1 18570 alpha[7] 228.7 0.0 2.7 223.4 226.9 228.7 230.5 234.0 21257 alpha[8] 248.4 0.0 2.7 243.0 246.6 248.4 250.2 253.7 16278 alpha[9] 283.3 0.0 2.7 278.0 281.5 283.3 285.1 288.6 19244 alpha[10] 219.2 0.0 2.7 213.9 217.4 219.2 221.0 224.6 17471 alpha[11] 258.2 0.0 2.7 252.9 256.4 258.2 260.1 263.6 21935 alpha[12] 228.2 0.0 2.7 222.9 226.4 228.2 230.0 233.5 19933 alpha[13] 242.4 0.0 2.7 237.1 240.6 242.4 244.2 247.7 14641 alpha[14] 268.3 0.0 2.7 262.9 266.4 268.3 270.1 273.7 20241 alpha[15] 242.8 0.0 2.7 237.6 241.0 242.8 244.6 248.0 14321 alpha[16] 245.3 0.0 2.7 239.9 243.5 245.3 247.1 250.5 15314 alpha[17] 232.2 0.0 2.7 227.0 230.4 232.2 234.0 237.4 20874 alpha[18] 240.5 0.0 2.7 235.2 238.6 240.5 242.3 245.6 16301 alpha[19] 253.8 0.0 2.7 248.5 252.0 253.8 255.6 259.1 22001 alpha[20] 241.6 0.0 2.7 236.2 239.8 241.6 243.4 246.9 14330 alpha[21] 248.5 0.0 2.7 243.3 246.8 248.5 250.3 253.8 19178 alpha[22] 225.2 0.0 2.7 220.0 223.4 225.2 227.1 230.5 19358 alpha[23] 228.5 0.0 2.7 223.2 226.7 228.5 230.3 233.8 19742 alpha[24] 245.1 0.0 2.7 239.9 243.3 245.1 246.9 250.2 14794 alpha[25] 234.5 0.0 2.7 229.2 232.7 234.5 236.2 239.7 18778 alpha[26] 254.0 0.0 2.7 248.7 252.2 254.0 255.8 259.2 19957 alpha[27] 254.4 0.0 2.7 249.1 252.6 254.4 256.2 259.7 21171 alpha[28] 243.0 0.0 2.7 237.7 241.2 243.0 244.8 248.3 14643 alpha[29] 217.9 0.0 2.7 212.6 216.1 217.9 219.7 223.3 18663 alpha[30] 241.4 0.0 2.6 236.3 239.6 241.4 243.2 246.6 16161 beta[1] 6.1 0.0 0.2 5.6 5.9 6.1 6.2 6.5 1717 beta[2] 7.1 0.0 0.3 6.6 6.9 7.1 7.2 7.5 1052 beta[3] 6.5 0.0 0.2 6.0 6.3 6.5 6.6 7.0 1218 beta[4] 5.3 0.0 0.3 4.8 5.2 5.3 5.5 5.8 1187 beta[5] 6.6 0.0 0.2 6.1 6.4 6.6 6.7 7.0 1936 beta[6] 6.2 0.0 0.2 5.7 6.0 6.2 6.3 6.7 1651 beta[7] 6.0 0.0 0.2 5.5 5.8 6.0 6.1 6.4 1702 beta[8] 6.4 0.0 0.2 5.9 6.3 6.4 6.6 6.9 1926 beta[9] 7.1 0.0 0.3 6.6 6.9 7.1 7.2 7.6 1444 beta[10] 5.8 0.0 0.2 5.4 5.7 5.8 6.0 6.3 1514 beta[11] 6.8 0.0 0.2 6.3 6.6 6.8 7.0 7.3 780 beta[12] 6.1 0.0 0.2 5.7 6.0 6.1 6.3 6.6 1924 beta[13] 6.2 0.0 0.2 5.7 6.0 6.2 6.3 6.6 1283 beta[14] 6.7 0.0 0.3 6.2 6.5 6.7 6.9 7.2 1523 beta[15] 5.4 0.0 0.3 4.9 5.2 5.4 5.6 5.9 1000 beta[16] 5.9 0.0 0.2 5.4 5.8 5.9 6.1 6.4 1920 beta[17] 6.3 0.0 0.2 5.8 6.1 6.3 6.4 6.8 2247 beta[18] 5.8 0.0 0.2 5.4 5.7 5.8 6.0 6.3 1694 beta[19] 6.4 0.0 0.2 5.9 6.3 6.4 6.6 6.9 1184 beta[20] 6.1 0.0 0.2 5.6 5.9 6.1 6.2 6.5 1666 beta[21] 6.4 0.0 0.2 5.9 6.2 6.4 6.6 6.9 1618 beta[22] 5.9 0.0 0.2 5.4 5.7 5.9 6.0 6.3 2248 beta[23] 5.8 0.0 0.2 5.3 5.6 5.8 5.9 6.2 1597 beta[24] 5.9 0.0 0.2 5.4 5.7 5.9 6.0 6.4 2203 beta[25] 6.9 0.0 0.2 6.4 6.7 6.9 7.1 7.4 1778 beta[26] 6.5 0.0 0.2 6.1 6.4 6.5 6.7 7.0 1912 beta[27] 5.9 0.0 0.2 5.4 5.7 5.9 6.1 6.4 1546 beta[28] 5.8 0.0 0.2 5.4 5.7 5.8 6.0 6.3 1776 beta[29] 5.7 0.0 0.2 5.2 5.5 5.7 5.8 6.2 1514 beta[30] 6.1 0.0 0.2 5.7 6.0 6.1 6.3 6.6 1630 mu_alpha 242.4 0.0 2.8 236.9 240.6 242.4 244.3 247.9 12061 mu_beta 6.2 0.0 0.1 6.0 6.1 6.2 6.3 6.4 2855 sigmasq_y 37.3 0.1 5.7 27.8 33.2 36.7 40.7 50.0 2879 sigmasq_alpha 219.1 1.2 64.6 125.7 173.1 208.1 252.0 374.3 2934 sigmasq_beta 0.3 0.0 0.1 0.1 0.2 0.3 0.3 0.5 408 sigma_y 6.1 0.0 0.5 5.3 5.8 6.1 6.4 7.1 2864 sigma_alpha 14.7 0.0 2.1 11.2 13.2 14.4 15.9 19.3 2966 sigma_beta 0.5 0.0 0.1 0.4 0.5 0.5 0.6 0.7 402 alpha0 106.3 0.1 3.7 99.0 103.9 106.3 108.8 113.6 5365 lp__ -438.4 0.1 6.9 -453.0 -442.7 -437.9 -433.5 -426.2 2614 Rhat alpha[1] 1 alpha[2] 1 alpha[3] 1 alpha[4] 1 alpha[5] 1 alpha[6] 1 alpha[7] 1 alpha[8] 1 alpha[9] 1 alpha[10] 1 alpha[11] 1 alpha[12] 1 alpha[13] 1 alpha[14] 1 alpha[15] 1 alpha[16] 1 alpha[17] 1 alpha[18] 1 alpha[19] 1 alpha[20] 1 alpha[21] 1 alpha[22] 1 alpha[23] 1 alpha[24] 1 alpha[25] 1 alpha[26] 1 alpha[27] 1 alpha[28] 1 alpha[29] 1 alpha[30] 1 beta[1] 1 beta[2] 1 beta[3] 1 beta[4] 1 beta[5] 1 beta[6] 1 beta[7] 1 beta[8] 1 beta[9] 1 beta[10] 1 beta[11] 1 beta[12] 1 beta[13] 1 beta[14] 1 beta[15] 1 beta[16] 1 beta[17] 1 beta[18] 1 beta[19] 1 beta[20] 1 beta[21] 1 beta[22] 1 beta[23] 1 beta[24] 1 beta[25] 1 beta[26] 1 beta[27] 1 beta[28] 1 beta[29] 1 beta[30] 1 mu_alpha 1 mu_beta 1 sigmasq_y 1 sigmasq_alpha 1 sigmasq_beta 1 sigma_y 1 sigma_alpha 1 sigma_beta 1 alpha0 1 lp__ 1 Sample were drawn using NUTS2 at Wed Sep 5 21:55:07 2012. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1).
One interesting difference between the two outputs is the R-hat value for the alpha0 variable. Alpha0 is generated quantity, representing the value of the intercept when non-centered data is used. For R2jags, the value of R-hat is 1.228, while R-hat is 1 for RStan. A quick look at the output indicates that R2jags used a thin value of 9, while RStan defaults to 1 (and creates a segfault when changed).
There is a major speed difference between the two methods. Now, it should be noted that JAGS can be faster on some problems and Stan can be faster on others. On my machine, the JAGS sampler took 7.1 seconds, while the Stan sampler took 25 seconds, which was a bit shocking to me. And this was timed using the second "stan" call, when the Stan model was pre-compiled. This is something that needs to be explored further.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.