Site icon R-bloggers

Continuous deployment of package documentation with pkgdown and Travis CI

[This article was first published on DataCamp Community - r programming, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The problem

pkgdown is an R package that can create a beautifully looking website for your own R package. Built and maintained by Hadley Wickham and his gang of prolific contributors, this package can parse the documentation files and vignettes for your package and builds a website from them with a single command: build_site(). This is what such a pkgdown-generated website looks like in action.

The html files that pkgdown generated are stored in a docs folder. If your source code is hosted on GitHub, you just have to commit this folder to GitHub, navigate to the Settings panel of your GitHub repo and enable GitHub pages to host the docs folder at https://<name_or_org>.github.io/<package_name>. It’s remarkably easy and a great first step. In fact, this is how the pkgdown-built website for pkgdown itself is hosted.

Although it’s an elegant flow, there are some issues with this approach. First, you’re committing files that were automatically generated even though the source required to build them is already stored in the package. In general, it’s not good practice to commit automatically generated files to your repo. What if you update your documentation, and commit the changes without rerendering the pkgdown website locally? Your repo files will be out of sync, and the pkgdown website will not reflect the latest changes. Second, there is no easy way to control when you release your documentation. Maybe you want to work off of the master branch, but you don’t want to update the docs until you’ve done a CRAN release and corresponding GitHub release. With the ad-hoc approach of committing the docs folder, this would be tedious.

The solution

There’s a quick fix for these concerns though, and that is to use Travis CI. Travis CI is a continuous integration tool that is free for open-source projects. When configured properly, Travis will pick up on any changes you make to your repo. For R packages, Travis is typically used to automatically run the battery of unit tests and check if the package builds on several previous versions of R, among other things. But that’s not all; Travis is also capable of doing deployments. In this case, I’ll show you how you can set up Travis so it automatically builds the pkgdown website for you, and commits the web files to the gh-pages branch, which is then subsequently used by GitHub to host your package website. To see how it’s set up for a R package in production check out the testwhat package on GitHub, which we use at DataCamp to grade student submissions and give useful feedback. In this tutorial, I will set up pkgdown for the tutorial package, another one of DataCamp’s open-source projects to make your blogs interactive.

The steps

  1. Go to https://travis-ci.org and link your GitHub account.
  2. On your Travis CI profile page, enable Travis for the project repo that you want to build the documentation for. The next time you push a change to your GitHub project, Travis will be notified and will try to build your project. More on that later.

  3. In the DESCRIPTION file of your R package, add pkgdown to the Suggests list of packages. This ensures that when travis builds/installs your package, it will also install pkgdown so we can use it for building the website.

  4. In the .gitignore file, make sure that the entire docs folder is ignored by git: add the line docs/*.
  5. Add a file with the name .travis.yml to your repo’s root folder, with the following content:

    language: r
    cache: packages
    
    after_success:
      - Rscript -e 'pkgdown::build_site()'
    
    deploy:
      provider: pages
      skip-cleanup: true
      github-token: $GITHUB_PAT
      keep-history: true
      local-dir: docs
      on:
        branch: master
    

    This configuration file is very short, but it’s doing a lot of different things. Jeroen Ooms and Jim Hester are maintaining a default Travis build configuration for R packages that does a lot of things for you out of the box. A Travis config file with only the language: r tag would already build, test and check your package for inconsistencies. Let’s go over the other fields:

    • cache: packages tells Travis to cache the package installs between builds. This will significantly speed up your package build time if you have some package dependencies.
    • after_success tells Travis which steps to take when the R CMD CHECK step has succeeded. In our case, we’re telling Travis to build the pkgdown website, which will create a docs folder on Travis’s servers.
    • Finally, deploy asks Travis to go ahead and upload the files in the docs folder (local-dir) to GitHub pages, as specified through provider: pages. The on field tells Travis to do this deployment step if the change that triggered a build happened on the master branch.

    For a full overview of the settings, you can visit this help article. We do not have to specify the GitHub target branch where the docs have to be pushed to, as it defaults to gh-pages.

  6. Notice that the deploy step also features a github-token field, that takes an environment variable. Travis needs this key to make changes to the gh-pages branch. To get these credentials and make sure Travis can find them:

    • Go to your GitHub profile settings and create a new personal access token (PAT) under the Developer Settings tab. Give it a meaningful description, and make sure to generate a PAT that has either the public_repo (for public packages) or repo (for private packages) scope.

    • Copy the PAT and head over to the Travis repository settings, where you can specify environment variables. Make sure to name the environment variable GITHUB_PAT.

  7. The build should be good to go now! Commit the changes to your repo (DESCRIPTION and .travis.yml) to the master branch of your GitHub repo with a meaningful message.

  8. Travis will be notified and get to work: it builds the package, checks it, and if these steps are successful, it will build the pkgdown website and upload it to gh-pages.

  9. GitHub notices that a gh-pages branch has been created, and immediately hosts it at https://<name_or_org>.github.io/<package_name>. In our case, that is https://datacamp.github.io/tutorial. Have a look. What a beauty! Without any additional configuration, pkgdown has built a website with the GitHub README as the homepage, a full overview of all exported functions, and vignettes under the Articles section.

From now on, every time you update the master branch of your package and the package checks pass, your documentation website will be updated automatically. You no longer have to worry about keeping the generated files in sync with your actual in-package documentation and vignettes. You can also easily tweak the deployment process so it only builds the documentation whenever you make a GitHub release. Along the way, you got continuous integration for your R package for free: the next time you make a change, Travis will notify you they broke any tests or checks.

Happy packaging!

References

To leave a comment for the author, please follow the link and comment on their blog: DataCamp Community - r programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.