Site icon R-bloggers

Docker Images for R: r-base versus r-apt

[This article was first published on R on datawookie, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I need to deploy a Plumber API in a Docker container. The API has some R package dependencies which need to be baked into the Docker image. There are a few options for the base image:

The first option, r-base, would require building the dependencies from source, a somewhat time consuming operation. The last option, r-apt, makes it possible to install most packages using apt, which is likely to be much quicker. I’ll immediately eliminate the other option, tidyverse, because although it already contains a load of packages, many of those are not required and, in addition, it incorporates RStudio Server, which is definitely not necessary for this project.

I’m trying to optimise the image for two criteria:

Of these the latter is the most important. Under normal circumstances I would not be too concerned about build time because Docker caches complete layers. However, this image is going to be built using a continuous integration system which will build the entire image from scratch each time.

r-base

Let’s start with the r-base image. Installing the required packages is as simple as executing R using RUN and then running install.packages(). The packages are installed from source, which takes some time (compiling source code and installing further dependencies).

FROM r-base

RUN R -e 'install.packages(c("plumber", "jsonlite", "dplyr", "stringr", "here"))'

r-apt

Using the r-apt image allows you to install binary versions of those packages.

FROM rocker/r-apt:bionic

RUN apt-get update && \
    apt-get install -y -qq \
    	r-cran-plumber \
    	r-cran-jsonlite \
    	r-cran-dplyr \
    	r-cran-stringr

RUN R -e 'install.packages("here")'

CMD ["R"]

The here package is not (curently) available as a binary, so you still need to compile that from source. This package alone does not have much impact on the build time.

The base r-apt image will launch bash by default, so we need to explicitly start R using CMD.

Build Times

What’s the difference in build times? Well it turns out that this really does make a big difference.

$ time docker build --no-cache -t r_apt -f Dockerfile-r-apt .
real    3m39.590s
user    0m0.115s
sys     0m0.082s
$ time docker build --no-cache -t r_base -f Dockerfile-r-base .
real    14m2.068s
user    0m0.467s
sys     0m0.456s

Using binary packages takes just less than 4 minutes (there is still some download time and I’m not on a very fast connection). Building those packages from source is much more time consuming, taking more than three times longer.

Image Sizes

What about the sizes of the images?

$ docker images
REPOSITORY                          TAG                 IMAGE ID            CREATED             SIZE
r_base                              latest              9e278980b348        2 minutes ago       944MB
r_apt                               latest              493e243d84fe        12 minutes ago      805MB

The image built using binary packages is smaller too.

Based on these results I’ll be using the r-apt base image for this project, but there’s no saying that r-base won’t come in handy for the next project!

To leave a comment for the author, please follow the link and comment on their blog: R on datawookie.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.