Setting up continuous multi-platform R package building, checking and testing with R-Hub, Docker and GitLab CI/CD for free, with a working example
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
In the previous post, we looked at how to easily automate R analysis, modeling, and development work for free using GitLab’s CI/CD. Together with the fantastic R-hub project, we can use GitLab CI/CD to do much more.
In this post, we will take it to the next level by using R-hub to test our development work on many different platforms such as multiple Linux setups, MS Windows and MacOS. We will also show how to automate and continuously execute those multiplatform checks using GitLab CI/CD integration and Docker images.
For those too busy to read, we also provide a working example implementation in a public GitLab repository.
Contents
Using R-hub to build, check and test our R package on many platforms
R-hub is a project supported by the R Consortium and offers free R CMD check as a service on different platforms. This enables us to quickly and efficiently check the R package you are developing to make sure it passes all necessary checks on several platforms. As an added bonus, the checks seem to be running in a very short time span, which means we can have your results at hand in a few minutes.
I also recommend that you read the why should you care about R-hub? blog post for more info.
Getting started with R-hub
Getting started with R-hub is also very simple and can be achieved in 3 lines of code, from a package directory or an RStudio project for a package:
# Install the package install.packages("rhub") # Validate your e-mail address # Provide the email argument if not detected automatically rhub::validate_email() # In an interactive session, # this will offer a list of platforms to choose from cr <- rhub::check()
Your validated_emails.csv
should be saved into rappdirs::user_data_dir("rhub", "rhub")
directory once validate_email()
was run successfully.
For more details on getting started, the Get started with rhub post has you covered in detail.
Using and evaluating R-hub check results via R scripts
For continuous integration purposes, we may want to evaluate the results of the check based on the number of errors, warnings, and notes that the check gives for each platform. To achieve this goal, we need to tackle 2 issues:
Getting the results in a non-interactive context
In a non-interactive session, R-hub will run the check asynchronously and end our process used to request the service to free up resources. This is great but can pose some challenges in the CI context, as we would have to keep around a job to repeatedly query the R-hub job’s status and processing the results once done. Or implement a much smarter reporting solution.
Luckily, since for this purpose maximizing efficiency is not our top concern, the simple workaround is to execute the check as-if in an interactive session via the CI tool. This will provide us with the actual results of the check as soon as done and also write the log into our CI’s run log, at the obvious cost of having the process blocked while waiting for the check to finish on R-hub’s servers.
Processing the check results
The public methods for an rhub_check
object currently seem to provide only side-effecting results such as printing them in various levels of detail and returning self
, so investigating results via code may be challenging.
The simplest current solution is to use the object’s private fields to access the results in the desired format. The below example looks at the status_
private field and returns a data frame with the number of errors, warnings, and notes for each. For an object containing only 1 check result it can look as follows:
statuses <- cr[[".__enclos_env__"]][["private"]][["status_"]] res <- do.call(rbind, lapply(statuses, function(thisStatus) { data.frame( plaform = thisStatus[["platform"]][["name"]], errors = length(thisStatus[["result"]][["errors"]]), warnings = length(thisStatus[["result"]][["warnings"]]), notes = length(thisStatus[["result"]][["notes"]]), stringsAsFactors = FALSE ) })) res ## plaform errors warnings notes ## 1 debian-gcc-release 0 0 0
Now we have a data frame which we can use to signal the CI/CD job to succeed or fail based on our wishes. For example, if we want to fail if the check discovered any notes, warnings or errors, a simple statement like the following will suffice:
if (any(colSums(res[2L:4L]) > 0)) { stop("Some checks resulted in errors, warnings or notes.") }
Putting it together into a script
Now that we have solved the above challenges, we can put it all together into a script that can be later used in the context of a CI/CD job:
# Retrieve passed command line arguments args <- commandArgs(trailingOnly = TRUE) if (length(args) != 1L) { stop("Incorrect number of args, needs 1: platform (string)") } platform <- args[[1L]] # Check if passed platform is valid if (!is.element(platform, rhub::platforms()[[1L]])) { stop(paste( "Given platform not in rhub::platforms()[[1L]]:", platform )) } # Run the check on the selected platform # Use show_status = TRUE to wait for results cr <- rhub::check(platform = platform, show_status = TRUE) # Get the statuses from private field status_ statuses <- cr[[".__enclos_env__"]][["private"]][["status_"]] # Create and print a data frame with results res <- do.call(rbind, lapply(statuses, function(thisStatus) { data.frame( plaform = thisStatus[["platform"]][["name"]], errors = length(thisStatus[["result"]][["errors"]]), warnings = length(thisStatus[["result"]][["warnings"]]), notes = length(thisStatus[["result"]][["notes"]]), stringsAsFactors = FALSE ) })) print(res) # Fail if any errors, warnings or notes found if (any(colSums(res[2L:4L]) > 0)) { stop("Some checks had errors, warnings or notes. See above for details.") }
Preparing a private docker image to use with R-hub
If you are new to Docker, Colin Fay has you covered with his Introduction to Docker for R Users blog post.
Creating and testing an image
Thanks to all the hard work done by the maintainers of the Rocker images, our task with creating an image suitable for use with R hub is very simple. Essentially we only need 2 additions to the r-base image:
- The
rhub
package and a few system dependencies - A
validated_emails.csv
file placed into the correct directory, providing R-hub with the information on validated e-mail to use for the checks
The following Dockerfile can be used the create such an image for yourself. Just make sure you have your validated_emails.csv
file present in the resources
folder when running docker build
.
To test our docker image, we can use a command like the following to create a container and run R within it in an interactive session:
docker run --rm -it <hub-username>/<repo-name>:<tag> R
Now we can see the list of validated e-mails in that R session:
rhub::list_validated_emails() ## email token ## 1 [email protected] 00000000000000000000
Pushing the image into a private repository
Now that we have our image created, we need to push it to a repository for GitLab CI to be able to use it. Normally this is very simple:
docker push <hub-username>/<repo-name>:<tag>
However as we are storing some relatively sensitive data in our image, namely our R-hub token we should probably make this image private. Thanks to Dockerhub, this process is very easy - just click the proper buttons as shown in this post in the Dockerhub docs. Note that for free a Dockerhub user has only 1 private repository available.
Creating a GitLab CI/CD pipeline
For an introduction to using GitLab CI/CD for R work, look at the previous post on How to easily automate R analysis, modeling and development work using CI/CD, with working examples
Setting up a pipeline with .gitlab-ci.yml
Now, we are ready with our private Docker image and the script to run and evaluate our R-hub checks, all that is left is to create and setup a CI/CD pipeline. For GitLab CI/CD, this means creating a .gitlab-ci.yml
file in the root of our GitLab repository directory. Without much extra talk, that file can look as follows:
image: index.docker.io/jozefhajnala/rhub:rbase stages: - check variables: _R_CHECK_CRAN_INCOMING_: "false" _R_CHECK_FORCE_SUGGESTS_: "true" before_script: - apt-get update check_ubuntu: stage: check script: - Rscript inst/rhubcheck.R "ubuntu-gcc-release" check_fedora: stage: check script: - Rscript inst/rhubcheck.R "fedora-clang-devel" check_mswin: stage: check script: - Rscript inst/rhubcheck.R "windows-x86_64-devel" check_macos: stage: check script: - Rscript inst/rhubcheck.R "macos-elcapitan-release"
This file will make sure that:
- The CI/CD jobs start from the image we have created
- Will have one stage named
check
- Set a couple of environment variables for R
- Run three jobs
check_ubuntu
,check_fedora
,check_mswin
, andcheck_macos
- each of them by usingRscript
to execute an R script stored underinst/rhubcheck.R
, with different arguments specifying the platform to check on
Authenticating to use a private repository
Since we have made our Docker image private, GitLab will not be able to use it out of the box, we need to provide it with information on how to authenticate against Dockerhub to be able to pull the private image. There are a few ways to reach this goal, I have used the one to setup a variable via the Settings -> CI/CD -> Variables
option in GitLab’s web UI:
The variable name should be DOCKER_AUTH_CONFIG
and the value:
{ "auths": { "registry.example.com:5000": { "auth": "bXlfdXNlcm5hbWU6bXlfcGFzc3dvcmQ=" } } }
Where
"registry.example.com:5000"
is replaced by our registry, for example"index.docker.io"
- the value for
"auth"
is replaced by a base64-encoded version of our"<username>:<password>"
, which we can retrieve for example using R:
base64enc::base64encode(charToRaw("my_username:my_password")) ## [1] "bXlfdXNlcm5hbWU6bXlfcGFzc3dvcmQ="
And that is all! We are now ready to run our checks using a Docker image stored in a private repository. Once we push the .gitlab-ci.yml
and inst/rhubcheck.R
files to a GitLab repository, the pipeline will be automatically executed every time we push a commit to that repository.
TL;DR: Just show it to me in action
In case you are only interested in seeing the CI/CD pipeline with R-hub implemented for an R package, look at:
- The .gitlab-ci.yml file for the
jhaddins
package on branch experimental - The Dockerfile used to build the image used in the above .gitlab-ci.yml
- An R script that runs the checks via R-hub and evaluates the results
- An example of a successful run with checks on 3 platforms
- An example output of a check on Windows provided by R-hub
References
R-hub
- The why should you care about R-hub? blog post
- Get started with R-Hub
- R-Hub on the R Consortium website
- R-Hub’s reference online
- Documentation on rhub_check R6 objects
R work and GitLab
- Blog post on automating R analysis, modeling and development using CI/CD, with working examples
- GitLab Continuous Integration documentation
- GitLab CI/CD environment variables
- Using a private container registry with GitLab CI/CD
R work and Docker
- Docker images for R on the Rocker Project
- Colin Fay’s Introduction to Docker for R Users
- Get started with Docker official documentation
- Using private repositories in DockerHub
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.