Testing an R package against a Julia package on github actions
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Something I have learned the hard way: every single line of code that I write should be accompanied by a unit test. Unit tests are not only useful for me as a developer to spot bugs, but a thorough sequence of unit tests should also increase others’ confidence in the quality of my code.
For statistical methods and algorithms, one great way to test if everything is working as intended is to test one’s own implementation of an algorithm against someone else’s.

Figure 1: Abraham is severely tested. (Sacrifice of Isaac, Caravaggio, ca. 1598. Barbara Piasecka-Johnson Collection.)
This idea is expressed in ropensci’s statistical software standards:
G5.4 Correctness tests to test that statistical algorithms produce expected results to some fixed test data sets (potentially through comparisons using binding frameworks such as RStata).
- G5.4b For new implementations of existing methods, correctness tests should include tests against previous implementations. Such testing may explicitly call those implementations in testing, preferably from fixed-versions of other software, or use stored outputs from those where that is not possible.
Preferably, these unit tests should be automized via continuous integration tools such as github actions. CI is super useful – a CI workflow might e.g. trigger to run all tests at every commit, so I definitely won’t forget to run them. Second, CI integration via github actions communicates to my package’s user that I have actually and successfully run all unit tests on the latest development version. Last, all tests are run “in the cloud”, so I am not blocking my own computer for (potentially) multiple minutes.1
As expressed in ropensci’s guideline G5.4b, in scientific computing, the same algorithm might only be implemented just once, or, in the lucky case that another implementation exists, it might be written in another language.
This applies to the “fast” cluster bootstrap algorithm implemented in fwildclusterboot. To my knowledge, there exists an R implementation (fwildclusterboot
), the original Stata version (boottest), and recently, David Roodman has written a Julia package, WildBootTests.jl.
Up until now, fwildclusterboot's
test sequence has been divided in two separate parts. Part one’s job was to test if the methods for different regression packages in R producec consistent results (“internal consistency”). These tests could easily be run on CRAN and github actions, as they only required fwildclusterboot
and a working R distribution.
Part two of the test sequence tested fwildclusterboot's
“external validity” by running the bootstrap in both R and Stata (via boottest & the RStata package). Before a CRAN submission, I would run all these tests on my local machine. Of course, the unit tests involving Stata could not be run on CRAN, and I did not think (and am now uncertain) that these tests could be automatized via github actions.2
As it happend, my Stata student license expired the other week, and when I checked StataCorps website for renewal fees, Stata – for non-academics – had a 840 € price tag for a one-year license. To me, this seemed to be a hefty price tag for running a few unit tests for fwildclusterboot
, and was a good incentive to update my unit tests.
Luckily, WildBootTests.jl
is now around, and as Julia is open source, I was fairly certain that I could transfer my “external” tests from Stata to Julia and deploy all tests on github actions.
How to set up a workflow that tests an R package against a Julia package
So, how does a workflow to test an R package against a Julia package looks like?
First of all, I have used the skeleton of fwildclusterboot
to produce a wrapper around WildBootTests.jl
, wildboottestjlr. All data pre-processing is handled by functions copied from fwildclusterboot
. For running the bootstrap, R sends all required objects to Julia (via the excellent JuliaConnectoR
package, on which I will blog in the future). In a second step, I have added wildboottestjlr
to fwildclusterboot's
dependencies and have initiated a github actions workflow via usethis::use_github_actions()
. (Note that it is in fact not neccessariy to write an entire R wrapper-package for the Julia algorithm you want to test. wildboottestjlr
exists because WildBootTests.jl
is actually a magnitude faster than fwildclusterboot
and because it has many additional feature, as e.g. a wild cluster bootstrap for instrumental variables.)
The created workflow yaml looks like this:
# For help debugging build failures open an issue on the RStudio community with the 'github-actions' tag. # https://community.rstudio.com/new-topic?category=Package%20development&tags=github-actions on: push: branches: - main - master pull_request: branches: - main - master name: R-CMD-check jobs: R-CMD-check: runs-on: ${{ matrix.config.os }} name: ${{ matrix.config.os }} (${{ matrix.config.r }}) strategy: fail-fast: false matrix: config: - {os: windows-latest, r: 'release'} - {os: macOS-latest, r: 'release'} - {os: ubuntu-20.04, r: 'release', rspm: "https://packagemanager.rstudio.com/cran/__linux__/focal/latest"} env: R_REMOTES_NO_ERRORS_FROM_WARNINGS: true RSPM: ${{ matrix.config.rspm }} GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} steps: - uses: actions/checkout@v2 - uses: r-lib/actions/setup-r@v2 with: r-version: ${{ matrix.config.r }} - uses: r-lib/actions/setup-pandoc@v2 - name: Query dependencies run: | install.packages('remotes') saveRDS(remotes::dev_package_deps(dependencies = TRUE), ".github/depends.Rds", version = 2) writeLines(sprintf("R-%i.%i", getRversion()$major, getRversion()$minor), ".github/R-version") shell: Rscript {0} - name: Cache R packages if: runner.os != 'Windows' uses: actions/cache@v2 with: path: ${{ env.R_LIBS_USER }} key: ${{ runner.os }}-${{ hashFiles('.github/R-version') }}-1-${{ hashFiles('.github/depends.Rds') }} restore-keys: ${{ runner.os }}-${{ hashFiles('.github/R-version') }}-1- - name: Install system dependencies if: runner.os == 'Linux' run: | while read -r cmd do eval sudo $cmd done < <(Rscript -e 'writeLines(remotes::system_requirements("ubuntu", "20.04"))') - name: Install dependencies run: | remotes::install_deps(dependencies = TRUE) remotes::install_cran("rcmdcheck") shell: Rscript {0} # install julia - uses: julia-actions/setup-julia@v1 # add julia to renviron - name: Create and populate .Renviron file run: echo JULIA_BINDIR= "${{ env.juliaLocation }}" >> ~/.Renviron shell: bash # install WildBootTests.jl - name: install WildBootTests.jl run: julia -e 'using Pkg; Pkg.add("WildBootTests")' # use shell bash to ensure consistent behavior across OS shell: bash - name: Check env: _R_CHECK_CRAN_INCOMING_REMOTE_: false run: rcmdcheck::rcmdcheck(args = c("--no-manual", "--as-cran"), error_on = "warning", check_dir = "check") shell: Rscript {0} - name: Upload check results if: failure() uses: actions/upload-artifact@main with: name: ${{ runner.os }}-r${{ matrix.config.r }}-results path: check
In a nutshell, this workflow installs R and fwildclusterboot
in a clean ubuntu environment, conducts a cmd-check and initiates all of fwildclusterboot's
unit tests.
To enable testing against Julia and WildBootTests.jl
, it further adds three ‘steps’ to the workflow: the workflow has to install Julia
, WildBootTests.jl
(and all its Julia dependencies) and link R and Julia so that JuliaConnectoR
could communicate between the two languages.
This is easily be achieved by adding the lines below to the yaml file create by usethis::use_github_action()
:
# install julia - uses: julia-actions/setup-julia@v1 # add julia to renviron - name: Create and populate .Renviron file run: echo JULIA_BINDIR= "${{ env.juliaLocation }}" >> ~/.Renviron shell: bash # install WildBootTests.jl - name: install WildBootTests.jl run: julia -e 'using Pkg; Pkg.add("WildBootTests")' # use shell bash to ensure consistent behavior across OS shell: bash
Placing these few lines in the workflow yaml before the cmd check is triggered completes the workflow. It now downloads R and Julia, installs fwildclusterboot
, wildboottestjlr
and WildBootTests.jl
, links R and Julia, and runs all of fwildclusterboot's
unit tests. The remaining step is to push to github, and then all unit tests are running!
The unit tests for fwildclusterboot currently run for about 30 minutes.↩︎
I have not implemented test that compare
fwildclusterboot
against other R implementations of the wild bootstrap, mainly due to performance reasons. As resampling methods are usually time intensive and fwildclusterboot is much faster than other R implementations, running unit tests against other R implementations would simply take a very long time.↩︎
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.