R, Stan and Bayesian Statistics
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
by Joseph Rickert
Just about two and a half years ago I wrote about some resources for doing Bayesian statistics in R. Motivated by the tutorial Modern Bayesian Tools for Time Series Analysis by Harte and Weylandt that I attended at R/Finance last month, and the upcoming tutorial An Introduction to Bayesian Inference using R Interfaces to Stan that Ben Goodrich is going to give at useR! I thought I'd look into what's new. Well, Stan is what's new! Yes, Stan has been under development and available for some time. But somehow, while I wasn't paying close attention, two things happened: (1) the rstan package evolved to make the mechanics of doing Bayesian in R analysis really easy and (2) the Stan team produced and/or organized an amazing amount of documentation.
My impressions of doing Bayesian analysis in R were set in the WinBUGS era. The separate WinBUGs installation was always tricky, and then moving between the BRugs and R2WinBUGS packages presented some additional challenges. My recent Stan experience was nothing like this. I had everything up and running in just a few minutes. The directions for getting started with rstan are clear and explicit about making sure that you have the right tool chain in place for your platform. Since I am running R 3.3.0 on Windows 10 I installed Rtools34. This went quickly and as expected except that C:\Rtools\gcc-4.x-y\bin did not show up in my path variable. Not a big deal: I used the menus in the Windows System Properties box to edit the Path statement by hand. After this, rstan installed like any other R package and I was able to run the 8schools example from the package vignette. The following 10 minute video by Ehsan Karim takes you through the install process and the vignette example.
The Stan documentation includes four major components: (1) The Stan Language Manual, (2) Examples of fully worked out problems, (3) Contributed Case Studies and (4) both slides and video tutorials. This is an incredibly rich cache of resources that makes a very credible case for the ambitious project of teaching people with some R experience both Bayesian Statistics and Stan at the same time. The “trick” here is that the documentation operates at multiple levels of sophistication with entry points for students with different backgrounds. For example, a person with some R and the modest statistics background required for approaching Gelman and Hill's extraordinary text: Data Analysis Using Regression and Multilevel/Hierarchical Models can immediately beginning running rstan code for the book's examples. To run the rstan version of the example in section 5.1, Logistic Regression with One Predictor, with no changes a student only needs only to copy the R scripts and data into her local environment. In this case, she would need the R script: 5._LogisticRegressionWithOnePredictor. R, the data: nes1992_vote.data.R and the Stan code: nes_logit.stan. The Stan code for this simple model is about as straightforward as it gets: variable declarations, parameter identification and the model itself.
data { | |
int<lower=0> N; | |
vector[N] income; | |
int<lower=0,upper=1> vote[N]; | |
} | |
parameters { | |
vector[2] beta; | |
} | |
model { | |
vote ~ bernoulli_logit(beta[1] + beta[2] * income); | |
} |
Running the script will produce the iconic logistic regression plot:
I'll wind down by curbing my enthusiasm just a little by pointing out that Stan is not the only game in town. JAGS is a popular alternative, and there is plenty that can be done with unaugmented R code alone as the Bayesian Inference Task View makes abundantly clear.
If you are a book person and new to Bayesian statistics, I highly recommend Bayesian Essentials with R by Jean-Michel Marin and Christian Robert. The authors provide a compact introduction to Bayesian statistics that is backed up with numerous R examples. Also, the new book by Richard McElreath, Statistical Rethinking: A Bayesian Course with Examples in R and Stan looks like it is going to be an outstanding read. The online supplements to the book are certainly worth a look.
Finally, if you are a Bayesian or a thinking about becoming one and you are going to useR!, be sure to catch the following talks:
- Bayesian analysis of generalized linear mixed models with JAGS, by Martyn Plummer
- bamdit: An R Package for Bayesian meta-Analysis of diagnostic test data by Pablo Emilio Verde
- Fitting complex Bayesian models with R-INLA and MCMC by Virgilio Gómez-Rubio
- bayesboot: An R package for easy Bayesian bootstrapping by Rasmus Arnling Bååth
- An Introduction to Bayesian Inference using R Interfaces to Stan by Ben Goodrich
- DiLeMMa – Distributed Learning with Markov Chain Monte Carlo Algorithms Using the ROAR Package by Ali Zaidi
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.