Stan Roundup, 10 November 2017

Bob Carpenter

5 years ago

[This article was first published on R – Statistical Modeling, Causal Inference, and Social Science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We’re in the heart of the academic season and there’s a lot going on.

James Ramsey reported a critical performance regression bug in Stan 2.17 (this affects the latest CmdStan and PyStan, not the latest RStan). Sean Talts and Daniel Lee diagnosed the underlying problem as being with the change from char* to std::string arguments—you can’t pass char* and rely on the implicit std::string constructor without the penalty of memory allocation and copying. The reversion goes back to how things were before with const char* arguments. Ben Goodrich is working with Sean Talts to cherry-pick the performance regression fix to Stan that led to a very slow 2.17 release for the other interfaces. RStan 2.17 should be out soon, and it will be the last pre-C++11 release. We’ve already opened the C++11 floodgates on our development branches (yoo-hoo!).
Quentin F. Gronau, Henrik Singmann, E. J. Wagenmakers released the bridgesampling package in R. Check out the arXiv paper. It runs with output from Stan and JAGS.
Andrew Gelman and Bob Carpenter‘s proposal was approved by Coursera for a four-course introductory concentration on Bayesian statistics with Stan: 1. Bayesian Data Analysis (Andrew), 2. Markov Chain Monte Carlo (Bob), 3. Stan (Bob), 4. Multilevel Regression (Andrew). The plan is to finish the first two by late spring and the second two by the end of the summer in time for Fall 2018. Advait Rajagopal, an economics Ph.D. student at the New School is going to be leading the exercise writing, managing the Coursera platform, and will also TA the first few iterations. We’ve left open the option for us or others to add a prequel and sequel, 0. Probability Theory, and 5. Advanced Modeling in Stan.
Dan Simpson is in town and dropped a casual hint that order statistics would clean up the discretization and binning issues that Sean Talts and crew were having with the simulation-based algorithm testing framework (aka the Cook-Gelman-Rubin diagnostics). Lo-and-behold, it works. Michael Betancourt worked through all the math on our (chalk!) board and I think they are now ready to proceed with the paper and recommendations for coding in Stan. As I’ve commented before, one of my favorite parts of working on Stan is watching the progress on this kind of thing from the next desk.
Michael Betancourt tweeted about using Andrei Kascha‘s javascript-based vector field visualization tool for visualizing Hamiltonian trajectories and with multiple trajectories, the Hamiltonian flow. Richard McElreath provides a link to visualizations of the fields for light, normal, and heavy-tailed distributions. The Cauchy’s particularly hypnotic, especially with many fewer particles and velocity highlighting.
Krzysztof Sakrejda finished the fixes for standalone function generation in C++. This lets you generate a double- and int-only version of a Stan function for inclusion in R (or elsewhere). This will go into RStan 2.18.
Sebastian Weber reports that the Annals of Applied Statistics paper, Bayesian aggregation of average data: An application in drug development, was finally formally accepted after two years in process. I think Michael Betancourt, Aki Vehtari, Daniel Lee, and Andrew Gelman are co-authors.
Aki Vehtari posted a case study for review on extreme-value analysis and user-defined functions in Stan [forum link — please comment there].
Aki Vehtari, Andrew Gelman and Jonah Gabry have made a major revision of Pareto smoothed importance sampling paper with improved algorithm, new Monte Carlo error and convergence rate results, new experiments with varying sample size and different functions. The next loo package release will use the new version.
Bob Carpenter (it’s weird writing about myself in the third person) posted a case study for review on Lotka-Volterra predator-prey population dynamics [forum link — please comment there].
Sebastian and Sean Talts led us through the MPI design decisions about whether to go with our own MPI map-reduce abstraction or just build the parallel map function we’re going to implement in the Stan language. Pending further review from someone with more MPI experience, the plan’s to implememt the function directly, then worry about generalizing when we have more than one function to implement.
Matt Hoffman (inventor of the original NUTS algorithm and co-founder of Stan) dropped in on the Stan meeting this week and let us know he’s got an upcoming paper generalizing Hamiltonian Monte Carlo sampling and that his team at Google’s working on probabilistic modeling for Tensorflow.
Mitzi Morris, Ben Goodrich, Sean Talts and I sat down and hammered out the services spec for running the generated quantities block of a Stan program over the draws from a previous sample. This will decouple the model fitting process and the posterior predictive inference process (because the generated quantities block generates a ỹ according to p(ỹ | θ) where ỹ is a vector of predictive quantities and θ is the vector of model parameters. Mitzi then finished the coding and testing and it should be merged soon. She and Ben Bales are working on getting it into CmdStan and Ben Goodrich doesn’t think it’ll be hard to add to RStan.
Mitzi Morris extended the spatial case study with leave-one-out cross-validation and WAIC comparisons of the simple Poisson model, a heterogeneous random effects model, a spatial random effects model, and a combined heterogeneous and spatial model with two different prior configurations. I’m not sure if she posted the updated version yet (no, because Aki is also in town and suggested checking Pareto khats, which said no).
Sean Talts split out some of the longer tests for less frequent application to get distribution testing time down to 1.5 hours to improve flow of pull requests.
Sean Talts is taking another one for the team by leading the charge to auto-format the C++ code base and then proceed with pre-commit autoformat hooks. I think we’re almost there after a spirited discussion of readability and our ability to assess it.
Sean Talts also added precompiled headers to our unit and integration tests. This is a worthwhile speedup when running lots of tests and part of the order of magnitude speedup Sean’s eked out.

ps. some edits made by Aki

To leave a comment for the author, please follow the link and comment on their blog: R – Statistical Modeling, Causal Inference, and Social Science.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.