Setting up continuous multi-platform R package building, checking and testing with R-Hub, Docker and GitLab CI/CD for free, with a working example

[This article was first published on Jozef's Rblog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

In the previous post, we looked at how to easily automate R analysis, modeling, and development work for free using GitLab’s CI/CD. Together with the fantastic R-hub project, we can use GitLab CI/CD to do much more.

In this post, we will take it to the next level by using R-hub to test our development work on many different platforms such as multiple Linux setups, MS Windows and MacOS. We will also show how to automate and continuously execute those multiplatform checks using GitLab CI/CD integration and Docker images.

For those too busy to read, we also provide a working example implementation in a public GitLab repository.

Using R-hub to build, check and test our R package on many platforms

R-hub is a project supported by the R Consortium and offers free R CMD check as a service on different platforms. This enables us to quickly and efficiently check the R package you are developing to make sure it passes all necessary checks on several platforms. As an added bonus, the checks seem to be running in a very short time span, which means we can have your results at hand in a few minutes.

I also recommend that you read the why should you care about R-hub? blog post for more info.

CI/CD running checks on multiple platforms with R-hub

CI/CD running checks on multiple platforms with R-hub

Getting started with R-hub

Getting started with R-hub is also very simple and can be achieved in 3 lines of code, from a package directory or an RStudio project for a package:

# Install the package
install.packages("rhub")

# Validate your e-mail address
# Provide the email argument if not detected automatically
rhub::validate_email()

# In an interactive session, 
# this will offer a list of platforms to choose from
cr <- rhub::check()

Your validated_emails.csv should be saved into rappdirs::user_data_dir("rhub", "rhub") directory once validate_email() was run successfully.

For more details on getting started, the Get started with rhub post has you covered in detail.

Using and evaluating R-hub check results via R scripts

For continuous integration purposes, we may want to evaluate the results of the check based on the number of errors, warnings, and notes that the check gives for each platform. To achieve this goal, we need to tackle 2 issues:

Getting the results in a non-interactive context

In a non-interactive session, R-hub will run the check asynchronously and end our process used to request the service to free up resources. This is great but can pose some challenges in the CI context, as we would have to keep around a job to repeatedly query the R-hub job’s status and processing the results once done. Or implement a much smarter reporting solution.

Luckily, since for this purpose maximizing efficiency is not our top concern, the simple workaround is to execute the check as-if in an interactive session via the CI tool. This will provide us with the actual results of the check as soon as done and also write the log into our CI’s run log, at the obvious cost of having the process blocked while waiting for the check to finish on R-hub’s servers.

Processing the check results

The public methods for an rhub_check object currently seem to provide only side-effecting results such as printing them in various levels of detail and returning self, so investigating results via code may be challenging.

The simplest current solution is to use the object’s private fields to access the results in the desired format. The below example looks at the status_ private field and returns a data frame with the number of errors, warnings, and notes for each. For an object containing only 1 check result it can look as follows:

statuses <- cr[[".__enclos_env__"]][["private"]][["status_"]]
res <- do.call(rbind, lapply(statuses, function(thisStatus) {
  data.frame(
    plaform  = thisStatus[["platform"]][["name"]],
    errors   = length(thisStatus[["result"]][["errors"]]),
    warnings = length(thisStatus[["result"]][["warnings"]]),
    notes    = length(thisStatus[["result"]][["notes"]]),
    stringsAsFactors = FALSE
  )
}))
res
##              plaform errors warnings notes
## 1 debian-gcc-release      0        0     0

Now we have a data frame which we can use to signal the CI/CD job to succeed or fail based on our wishes. For example, if we want to fail if the check discovered any notes, warnings or errors, a simple statement like the following will suffice:

if (any(colSums(res[2L:4L]) > 0)) {
  stop("Some checks resulted in errors, warnings or notes.")
}

Putting it together into a script

Now that we have solved the above challenges, we can put it all together into a script that can be later used in the context of a CI/CD job:

# Retrieve passed command line arguments
args <- commandArgs(trailingOnly = TRUE)
if (length(args) != 1L) {
  stop("Incorrect number of args, needs 1: platform (string)")
}
platform <- args[[1L]]

# Check if passed platform is valid 
if (!is.element(platform, rhub::platforms()[[1L]])) {
  stop(paste(
    "Given platform not in rhub::platforms()[[1L]]:",
    platform
  ))
}

# Run the check on the selected platform
# Use show_status = TRUE to wait for results
cr <- rhub::check(platform = platform, show_status = TRUE)

# Get the statuses from private field status_
statuses <- cr[[".__enclos_env__"]][["private"]][["status_"]]

# Create and print a data frame with results
res <- do.call(rbind, lapply(statuses, function(thisStatus) {
  data.frame(
    plaform  = thisStatus[["platform"]][["name"]],
    errors   = length(thisStatus[["result"]][["errors"]]),
    warnings = length(thisStatus[["result"]][["warnings"]]),
    notes    = length(thisStatus[["result"]][["notes"]]),
    stringsAsFactors = FALSE
  )
}))
print(res)

# Fail if any errors, warnings or notes found
if (any(colSums(res[2L:4L]) > 0)) {
  stop("Some checks had errors, warnings or notes. See above for details.")
}

Preparing a private docker image to use with R-hub

If you are new to Docker, Colin Fay has you covered with his Introduction to Docker for R Users blog post.

Creating and testing an image

Thanks to all the hard work done by the maintainers of the Rocker images, our task with creating an image suitable for use with R hub is very simple. Essentially we only need 2 additions to the r-base image:

  1. The rhub package and a few system dependencies
  2. A validated_emails.csv file placed into the correct directory, providing R-hub with the information on validated e-mail to use for the checks

The following Dockerfile can be used the create such an image for yourself. Just make sure you have your validated_emails.csv file present in the resources folder when running docker build.

To test our docker image, we can use a command like the following to create a container and run R within it in an interactive session:

docker run --rm -it <hub-username>/<repo-name>:<tag> R

Now we can see the list of validated e-mails in that R session:

rhub::list_validated_emails()
##                  email                token
## 1 [email protected] 00000000000000000000

Pushing the image into a private repository

Now that we have our image created, we need to push it to a repository for GitLab CI to be able to use it. Normally this is very simple:

docker push <hub-username>/<repo-name>:<tag>

However as we are storing some relatively sensitive data in our image, namely our R-hub token we should probably make this image private. Thanks to Dockerhub, this process is very easy - just click the proper buttons as shown in this post in the Dockerhub docs. Note that for free a Dockerhub user has only 1 private repository available.

Creating a GitLab CI/CD pipeline

For an introduction to using GitLab CI/CD for R work, look at the previous post on How to easily automate R analysis, modeling and development work using CI/CD, with working examples

Setting up a pipeline with .gitlab-ci.yml

Now, we are ready with our private Docker image and the script to run and evaluate our R-hub checks, all that is left is to create and setup a CI/CD pipeline. For GitLab CI/CD, this means creating a .gitlab-ci.yml file in the root of our GitLab repository directory. Without much extra talk, that file can look as follows:

image: index.docker.io/jozefhajnala/rhub:rbase

stages:
  - check

variables:
  _R_CHECK_CRAN_INCOMING_: "false"
  _R_CHECK_FORCE_SUGGESTS_: "true"

before_script:
  - apt-get update

check_ubuntu:
  stage: check
  script:
    - Rscript inst/rhubcheck.R "ubuntu-gcc-release"

check_fedora:
  stage: check
  script:
    - Rscript inst/rhubcheck.R "fedora-clang-devel"

check_mswin:
  stage: check
  script:
    - Rscript inst/rhubcheck.R "windows-x86_64-devel"

check_macos:
  stage: check
  script:
    - Rscript inst/rhubcheck.R "macos-elcapitan-release"

This file will make sure that:

  1. The CI/CD jobs start from the image we have created
  2. Will have one stage named check
  3. Set a couple of environment variables for R
  4. Run three jobs check_ubuntu, check_fedora, check_mswin, and check_macos - each of them by using Rscript to execute an R script stored under inst/rhubcheck.R, with different arguments specifying the platform to check on

Authenticating to use a private repository

Since we have made our Docker image private, GitLab will not be able to use it out of the box, we need to provide it with information on how to authenticate against Dockerhub to be able to pull the private image. There are a few ways to reach this goal, I have used the one to setup a variable via the Settings -> CI/CD -> Variables option in GitLab’s web UI:

Creating CI/CD variable with GitLab

Creating CI/CD variable with GitLab

The variable name should be DOCKER_AUTH_CONFIG and the value:

{
  "auths": {
    "registry.example.com:5000": {
      "auth": "bXlfdXNlcm5hbWU6bXlfcGFzc3dvcmQ="
    }
  }
}

Where

  • "registry.example.com:5000" is replaced by our registry, for example "index.docker.io"
  • the value for "auth" is replaced by a base64-encoded version of our "<username>:<password>", which we can retrieve for example using R:
base64enc::base64encode(charToRaw("my_username:my_password"))
## [1] "bXlfdXNlcm5hbWU6bXlfcGFzc3dvcmQ="

And that is all! We are now ready to run our checks using a Docker image stored in a private repository. Once we push the .gitlab-ci.yml and inst/rhubcheck.R files to a GitLab repository, the pipeline will be automatically executed every time we push a commit to that repository.

TL;DR: Just show it to me in action

In case you are only interested in seeing the CI/CD pipeline with R-hub implemented for an R package, look at:

References

R work and Docker

To leave a comment for the author, please follow the link and comment on their blog: Jozef's Rblog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)