R, Docker and Checkpoint: A Route to Reproducibility
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I need to deploy Shiny on a Windows machine. I also need to use {checkpoint}
for package management. Using Docker seems to be the only reasonable approach to Shiny on Windows. But how easy would it be to also factor {checkpoint}
into this setup?
Only one reasonable way to find out: give it a try.
Below is the simple Dockerfile
I used. Here are the fundamental components of what it does:
- Derived from the R-3.6.1 image from rocker.
- Create an environment variable
CHECKPOINT_DATE
with the snapshot date for{checkpoint}
. - Install the
{checkpoint}
package. - Make a snapshot folder for
{checkpoint}
. - Add commands to
.Rprofile
which will load{checkpoint}
and select the required snapshot. - Install a sample package under
{checkpoint}
.
FROM rocker/r-ver:3.6.1 ENV CHECKPOINT_DATE 2018-12-01 RUN R -e "install.packages('checkpoint')" && \ mkdir -p /root/.checkpoint/${CHECKPOINT_DATE} && \ echo "library(checkpoint); checkpoint('${CHECKPOINT_DATE}', scanForPackages = FALSE);" >~/.Rprofile && \ R -e "install.packages('colorspace')"
After building the Docker image you’re ready to give this a whirl. There are (at least) two ways that you could use this:
- install packages onto the image (will result in image bloat and requirement to rebuild for any new packages) or
- share a volume from the host which contains a
{checkpoint}
snapshot folder with the Docker container.
Packages on the Image
Let’s look at the option of installing packages on the image first. The Dockerfile
above already installs the {colorspace}
package. Let’s test that out.
In the screen shot below the top panel shows the launch of the image and successful loading of the {colorspace}
package. The lower panel connects a shell to the running container and lists the contents of the snapshot folder to confirm that the {colorspace}
package is there.
Packages on the Host
If you share a snapshot folder from the host with the container then you get a lot more flexibility.
In the screen shot below the top panel shows the launch of the image, where the ~/.checkpoint
folder on the host is shared with the container. Now it’s possible to select any of the snapshots present on the host. For example, rather than choosing the 2018-12-01 snapshot installed on the image, we can now select the 2019-06-01 snapshot from the host.
Of these two options the latter seems like a more flexible solution. If, however, your aim is to provide a Docker image with a complete (and reproducible) computational environment, then the former is definitely the way to go: it’s less flexible but the package versions are all locked down on the image.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.