Docker Images for R: r-base versus r-apt
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I need to deploy a Plumber API in a Docker container. The API has some R package dependencies which need to be baked into the Docker image. There are a few options for the base image:
The first option, r-base, would require building the dependencies from source, a somewhat time consuming operation. The last option, r-apt, makes it possible to install most packages using apt
, which is likely to be much quicker. I’ll immediately eliminate the other option, tidyverse, because although it already contains a load of packages, many of those are not required and, in addition, it incorporates RStudio Server, which is definitely not necessary for this project.
I’m trying to optimise the image for two criteria:
- small image and
- quick build time.
Of these the latter is the most important. Under normal circumstances I would not be too concerned about build time because Docker caches complete layers. However, this image is going to be built using a continuous integration system which will build the entire image from scratch each time.
r-base
Let’s start with the r-base image. Installing the required packages is as simple as executing R using RUN
and then running install.packages()
. The packages are installed from source, which takes some time (compiling source code and installing further dependencies).
FROM r-base RUN R -e 'install.packages(c("plumber", "jsonlite", "dplyr", "stringr", "here"))'
r-apt
Using the r-apt image allows you to install binary versions of those packages.
FROM rocker/r-apt:bionic RUN apt-get update && \ apt-get install -y -qq \ r-cran-plumber \ r-cran-jsonlite \ r-cran-dplyr \ r-cran-stringr RUN R -e 'install.packages("here")' CMD ["R"]
The here
package is not (curently) available as a binary, so you still need to compile that from source. This package alone does not have much impact on the build time.
The base r-apt image will launch bash
by default, so we need to explicitly start R using CMD
.
Build Times
What’s the difference in build times? Well it turns out that this really does make a big difference.
$ time docker build --no-cache -t r_apt -f Dockerfile-r-apt . real 3m39.590s user 0m0.115s sys 0m0.082s $ time docker build --no-cache -t r_base -f Dockerfile-r-base . real 14m2.068s user 0m0.467s sys 0m0.456s
Using binary packages takes just less than 4 minutes (there is still some download time and I’m not on a very fast connection). Building those packages from source is much more time consuming, taking more than three times longer.
Image Sizes
What about the sizes of the images?
$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE r_base latest 9e278980b348 2 minutes ago 944MB r_apt latest 493e243d84fe 12 minutes ago 805MB
The image built using binary packages is smaller too.
Based on these results I’ll be using the r-apt base image for this project, but there’s no saying that r-base won’t come in handy for the next project!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.