Site icon R-bloggers

Using Containerized Travis-CI to check R in RMarkdown files

[This article was first published on R on kieranhealy.org , and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’m teaching a short graduate seminar on Data Visualization with R this semester. Following Matt Salganik, I wanted students to be able to submit homework or other assignments as R Markdown files, but to have a way to make sure their R code passed some basic stylistic checks provided by lintr before they submitted it to me. Students write .Rnw files containing discussion or notes interspersed with chunks of R code. We just want to check the code meets some minimal level of syntactical and stylistic correctness. This makes it easier to read at the time and also easier to return to later. This is a useful habit to have beyond the context of homework assignments for a particular course, too.

When I write things in R locally I usually have lintr running in the background of my text editor. It’s supported in RStudio, Emacs, Vim, and other editors, as detailed on the lintr development page But the idea of linting via GitHub and Travis-CI is also appealing, especially when students are submitting assignments or code snippets on GitHub anyway. Travis CI is a “continuous integration” service designed for much heavier lifting than I’m doing here. It’s for software developers who want to check their code as they go, making sure it compiles and passes various tests. But we can use it to quickly check our code.

Two things have changed since Matt’s helpful writeup of the process, both of which make life easier. First, lintr can now check .Rnw files natively, so we don’t have to write a script to manually extract the R code before linting it. Second, Travis can containerize builds so that they run faster. More on this in a second. Containerization on Travis-CI means some aspects of the development environment are a more restrictive than they would otherwise be. But this doesn’t matter to us right now.

Here’s an example. We create a GitHub repository called lintscreen and set it up so that Travis-CI will see it. Travis’s build environment is controlled by a configuration file called .travis.yml that lives in our repository. Jan Tilly has done all the hard work of configuring a container-based R on Travis, so I just follow his example here. His configuration is intended for people writing R packages. We’re just linting code, not running it, so things are more straightforward. Here’s what .travis.yml looks like:

# .travis.yml using container-based infrastructure
# travis configuration file courtesy of Jan Tilly:
# https://github.com/jtilly/R-travis-container-example

# use c as catch-all language
language: c

# use containers
sudo: false

# only run for pushes to master branch
branches:
  only:
   - master

# install R: use r-packages-precise (https://cran.r-project.org/bin/linux/ubuntu/precise/) 
# as source which is white listed (https://github.com/travis-ci/apt-source-whitelist/)
addons:
  apt:
    sources:
    - r-packages-precise
    packages:
    - r-base-dev	
    - r-recommended
    - pandoc

# cache local R libraries directory:
cache:
  directories:
    - ~/Rlib

# install the package and dependencies:
# - create directory for R libraries (if not already exists)
# - create .Renviron with location of R libraries
# - define R repository in .Rprofile
# - add .travis.yml to .Rbuildignore
# - install devtools if not already installed
# - install covr if not already installed
# - update all installed packages

install:
  - mkdir -p ~/Rlib
  - echo 'R_LIBS=~/Rlib' > .Renviron
  - echo 'options(repos = "http://cran.rstudio.com")' > .Rprofile
  - echo '.travis.yml' > .Rbuildignore
  - Rscript -e 'if(!"devtools" %in% rownames(installed.packages())) { install.packages("devtools", dependencies = TRUE) }'
  - Rscript -e 'if(!"covr" %in% rownames(installed.packages())) { install.packages("covr", dependencies = TRUE) }'
  - Rscript -e 'update.packages(ask = FALSE, instlib = "~/Rlib")'


# Lint
script:
  - ./travis-linter.sh

As you can see in the “addons” and “install” segments, we get a whole R setup here, including all of devtools and the covr testing suite. We don’t really need these for what we’re doing, and so this configuration file could be simplified even more than I’ve already done. I’ve left them there partly to remind you that you can use this environment for more challenging coding tasks. In any event, once R is setup and the additional packages compiled in our container’s local directory, we tell Travis (in the script:) section, to run a very simple shell script. It takes any .Rmd files in the top-level directory and puts them through lintr, returning a non-zero exit status if anything goes wrong. It looks like this:

#!/bin/bash
set -e

exitstatus=0

for file in *.Rmd
do
    Rscript -e "lintr::lint("$file")"
    outputbytes=`Rscript -e "lintr::lint("$file")" | grep ^ | wc -c`
    if [ $outputbytes -gt 0 ]
    then
        exitstatus=1
    fi
done

exit $exitstatus

It’s very straightforward, but a script like this can easily be extended to perform much more complicated tasks. Rscript is called twice so that you can see the output (if any) in your Travis log (the first call) and to generate the exit status that lets Travis decide whether your build has failed (the second call). (There’s certainly got to be a more efficient way to do this than effectively linting the file twice. Probably I should just pipe the first to a file and cat the output if there is any, while setting the exit status at the same time.)

Over at Travis, you get the results of all of this activity on the log screen for your repository. The first time it runs it takes about ten minutes, because the local R packages have to be built. But then those packages get cached, so subsequent runs take less than a minute. When lintr finds something to complain about, the script exits with a status code of 1 so Travis says it failed. It looks like this:

In this case, lintr is complaining that I’ve used = as an assignment operator in R instead of <-, in violation of the style rules. If we fix the errors in our text editor, commit the change in git, and push them to the repo, then Travis notices, reruns everything, and then gives you the good news.

The upshot is that if people are working with .Rmd files and using GitHub, they can set up Travis, drop the .travis.yml configuration and travis-linter.sh script into their repo, and have Travis-CI automatically and quickly check their code before they submit it.

To leave a comment for the author, please follow the link and comment on their blog: R on kieranhealy.org .

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.