Introducing the Reproducible R Toolkit and the checkpoint package
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The ability to create reproducible research is an important topic for many users of R. So important, that several groups in the R community have tackled this problem. Notably, packrat from RStudio, and gRAN from Genentech (see our previous blog post).
The Reproducible R Toolkit is a new open-source initiative from Revolution Analytics. It takes a simple approach to dealing with R package versions, consisting of an R package checkpoint, and an associated daily CRAN snapshot archive, checkpoint-server. Here's one illustration of the problem it solves (with apologies to xkcd):
checkpoint-server
To achieve reproducibility, we store daily snapshots of all CRAN packages. At midnight UTC each day we refresh the CRAN mirror and then store a snapshot of CRAN as it exists at that very moment. You can access these daily snapshots using the checkpoint package, which installs and consistently use these packages just as they existed at the snapshot date. Daily snapshots exist starting from 2014-09-17.
checkpoint package
The goal of the checkpoint package is to solve the problem of package reproducibility in R. Since packages get updated on CRAN all the time, it can be difficult to recreate an environment where all your packages are consistent with some earlier state. To solve this issue, checkpoint allows you to install packages locally as they existed on a specific date from the corresponding snapshot (stored on the checkpoint server) and it configures your R session to use only these packages. Together, the checkpoint package and the checkpoint server act as a “CRAN time machine”, so that anyone using checkpoint can ensure the reproducibility of scripts or projects at any time.
How to use checkpoint
One you have the checkpoint package installed, using the checkpoint() function is as simple as adding the following lines to the top of your script:
Typically, you will use the date you created the script as the argument to checkpoint. The first time you run the script, checkpoint will inspect your script (and other R files in the same project folder) for the packages used, and install the required packages with versions as of the specified date. (The first time you run the script, it will take some time to download and install the packages, but subsequent runs will use the previously-installed package versions.)
The checkpoint package installs the packages in a folder specific to the current project (in a subfolder of
If you want to update the packages you use at a later date, just update the date in the checkpoint() call and checkpoint() will automatically update the locally-installed packages.
Installing the checkpoint package
The checkpoint package is available on CRAN:
Worked example
To find out more:
Feedback and Thanks
The Reproducible R Toolkit was created by the Open Source Solutions group at Revolution Analytics. Special thanks go to Scott Chamberlain who helped with early development.
We'd love to know what you think about checkpoint. Leave comments here on the blog, or via the checkpoint GitHub page.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.