How to compile R Markdown documents using Docker
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I do some freelance web programming and got a request from a client to make a new monthly
sales report for their web shop.
After specifying what should be in it, I thought to myself, “this would be so quick to make with
R Markdown and ggplot2
… but wait, why not make it with R Markdown and
ggplot2
?”
This was actually a pefect case for it, since it needed to be printable pdf and come out once a month. So no need for shiny interactivity this time (no pun intended), just a formally looking sales report with figures and tables in it. Let’s do it.
The first thing to think about was how to generate the pdf reports in the web server. Generating pdf:s with R Markdown requires first of all R, but also LaTeX to be installed, and as you might guess, they were not available in the web server. An easy way to avoid having to install additional packages to a web server itself is to use Docker containers that contain the required packages.
Using R Markdown in a Docker container
As mentioned above, using a Docker container saves me from having to installing R, LaTeX and other dependencies to the server itself. They don’t need to be maintained and I can easily deploy the container to any other web server if a need arises.
I’m going to use rocker/verse
image from the rocker R images for creating the report.
This image has the R Markdown and LaTeX systems pre-installed for compiling pdf reports.
If you are new to Docker, here is the official documentation of how to get it installed to your system.
Assuming you have Docker installed, let’s pull the rocker/verse
image with R version 3.5.1
:
docker pull rocker/verse:3.5.1
First I’m going to test the image to see what R packages I possibly need to add to the image.
This image has also RStudio pre-installed in it, and it is configured to run the RStudio server by
default.
So for testing, running the following command will start RStudio in localhost:8787
:
docker run --rm -p 8787:8787 rocker/verse:3.5.1
Once you type localhost:8787
to your browser, you will be asked for a username and password,
which are both rstudio
by default.
To my happy surprise, all the packages that I needed were already installed so I could just
go ahead running the pdf compilation.
Generating the pdf with Docker
Let’s begin by compiling pdf reports with the rocker/verse:3.5.1
image.
I have provided a git repository that contains example files for creating pdf reports
with R Markdown and Docker.
You can clone it to you computer by running
git clone https://github.com/jlintusaari/R-docker-report.git
The repository contains an example_report.R
that takes as input a csv file and generates a pdf
report using the example_report.Rmd
template below:
--- title: "Texas housing sales report" author: "Matti Meikäläinen" date: "`r paste(Sys.Date())`" output: pdf_document --- ## Number of sales The following chart shows the number of housing sales in three cities in Texas. Data is from the `txhousing` data set provided by the TAMU real estate center. ```{r echo=FALSE, message=FALSE, warning=FALSE} ggplot(dt, aes(date, sales, color=city)) + geom_point() + geom_smooth() ```
The generated pdf looks like this:
The example csv data is stored in data.csv
file and was created with the R/make_csv.R
script.
Assuming we are in the folder where the example_report.R
file is located, we run the following
command to compile the report with Docker:
docker run --rm -v $PWD:/report -w /report rocker/verse:3.5.1 \ Rscript --vanilla example_report.R data.csv
The compilation works, but there are multiple things that could be improved:
- The default user in the container is root, causing the generated pdf to be owned by root
- The above command is rather long and requires setting e.g. the working directory with
-w
- With my actual report, the latex system was missing some packages that were automatically installed but made the pdf compilation slow
So let’s create a new Docker image based on rocker/verse:3.5.1
that is better configured for our
purposes.
We will do this using a Dockerfile.
The following Dockerfile
starts the image creation from the rocker/verse:3.5.1
image and
adds configurations to address the above issues:
FROM rocker/verse:3.5.1 # My sales report required an additional latex package called `eurosym`. # RUN tlmgr install eurosym # Set a user and the working directory USER rstudio WORKDIR /report # Set the container to run `Rscript --vanilla ` by default ENTRYPOINT ["/usr/local/bin/Rscript", "--vanilla"] # Set the `example_report.R data.csv` as the default script to run with ENTRYPOINT's Rscript CMD ["example_report.R", "data.csv"]
You can find this file from the docker
folder in the git repository.
Now assuming you are in the docker
folder where the Dockerfile
is located, you can create the new image
with:
docker build -t report-maker .
After this we can use the report-maker
image to make the pdf
compilation both faster and more
convenient.
The following command will create the report
(remember to remove the root owned example_report.pdf
first if you haven’t):
docker run --rm -v $PWD:/report report-maker
Of course our report-maker
image can be used to run any kind of R scripts, but the name is
descriptive for our purpose.
Production use
The web application I’m deploying this for is implemented with Ruby on Rails. Rails provides an inbuilt task system that can easily access the database of the web-application to retrieve relevant data for report creation.
In this particular case the pdf reports need to be generated once a month. I will thus create a rails task that:
- Queries the database and saves the relevant information to a csv file
- Provides the generated csv file to the
report-maker
After that I setup a cron job that runs the task the first day of every month. Once generated, the report is downloadable for the client through the web application.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.