Writing my first custom CICD action for the pharmaverseblog

Edoardo Mancini

5 months ago

[This article was first published on pharmaverse blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

< !--------------- typical setup -----------------> < !--------------- post begins here -----------------> < section id="the-problem" class="level1">

The problem

Each pharmaverseblog post is tagged with one or more categories that describe the topics discussed within it. For instance, this post is tagged Technical. When making a new blog post, users are invited to select the tags from a curated list designed to split the posts according to categories that balance specificity and generality. Here is the list we currently use:

c("Metadata", "SDTM", "ADaM", "TLG", "Shiny", "Community", "Conferences", "Submissions", "Technical")

Users can add to this list, however we have observed that more often than not, if users do diverge, it is due to a typo. This has unfortunate effects within our pharmaverseblog, chiefly that our front page glossary of posts is now split:

Within the pharmaverseblog editor team, we wondered whether there was a simple way to police these tags a bit more, perhaps in an automated manner. Enter CICD checks!

What is CICD?

CICD stands for Continuous Integration, Continuous Deployment and is a catch-all term for automated code pipelines that ensure that new code added to an existing codebase seamlessly assimilates with the rest of the codebase without introducing unexpected behavior.

Often, when working in R projects hosted on GitHub, one encounters CICD in the form of checks that are triggered when making a pull request to the main branch (see image below for an example). These can check all sorts of aspects, ranging from correct spelling in your documentation all the way to executing your function unit tests and checking they all pass.

The pharmaverseblog already had three active CICD pipelines, which run for spelling, style and links. For style and links, we did not write the code for these checks ourselves, rather we just activated open-source checks for our code repository. For spelling, this is a custom pipeline written by one of our blog editors, Stefan Thoma. So, what if we could write another custom CICD pipeline to check that new blog posts use tags from our selected list?

< section id="a-cicd-solution" class="level1">

A (CICD) solution

TLDR

I wrote a function, check_post_tags() which scans the .qmd file containing a blog post, extracts the tag(s), and checks them against the allowed list.
Then, I set up a script which loops check_post_tags() over all our blog posts and identifies offending posts.
Following that, I used the {cli} package to format a nice error message for all the offending posts and their tags.
Finally, in .github/workflows, I created a new CICD workflow named check_post_tags.yml which simply executes my script upon every pull request to main.

< section id="constructing-the-check" class="level2">

Constructing the check

I decided that my strategy would be to write the main body of my CICD check as an R script. As I am relatively confident in R, the main challenge for me would be to then figure out how to automatically execute that script in a CICD pipeline.

After some playing around, I settled with the following 30-line script:

# Get list of blog posts ----
posts <- list.files("posts", recursive = TRUE, pattern = "*.qmd")

# Get vector of allowed tags ----
source("R/allowed_tags.R")

# Function to extract tags from a post and check them against the allowed list ----
check_post_tags <- function(post, allowed_post_tags = allowed_tags) {
  post_tags <- rmarkdown::yaml_front_matter(file.path("posts", post))$categories
  cross_check <- post_tags %in% allowed_post_tags
  problematic_tags <- post_tags[!cross_check]

  if (!all(cross_check)) {
    cli::format_message("The tag(s) {.val {problematic_tags}} in the post {.val {post}} are not from the allowed list of tags.")
  }
}

# Apply check_post_tags to all blog posts and find problem posts ----
check_results <- lapply(posts, check_post_tags)
error_messages <- unlist(Filter(Negate(is.null), check_results))

# Construct error message if one or more posts have problematic tags ----
if (length(error_messages) > 0) {
  error_messages <- c(error_messages, "Please select from the following tags: {.val {allowed_tags}}, or contact one of the maintainers.")
  names(error_messages) <- rep("x", length(error_messages) - 1)

  concatenated_error_messages <- cli::cli_bullets(error_messages)

  cli::cli_abort(concatenated_error_messages)
}

The script works as follows:

Get a full list of blog posts. These are all the .qmd files within the posts folder of the pharmaverseblog repo.
Specify a vector of “allowed tags” in the allowed_tags.R script.
Specify a function that, given a post:
1. Extracts categories from the yaml header of the .qmd file.
2. Cross-checks the tags with the allowed ones.
3. For problematic tags, uses the {cli} package to construct a nicely-formatted error message.
Loop check_post_tags() over all blog posts using a simple lapply() call and extract all error messages.
If there are any error messages, use {cli} again to construct a concatenated error message.

The final error message looks something like the below:

✖ The tag(s) "ADaMs" in the post
  "2024-06-17_new_admiral_ex.../new_admiral_extension_packages_admiralpeds_coming_soon.qmd"
  are not from the allowed list of tags.
Please select from the following tags: "Metadata", "SDTM", "ADaM", "TLG", "Shiny",
"Community", "Conferences", "Submissions", and "Technical", or contact one of the maintainers.

< section id="creating-a-pipeline-for-the-check" class="level2">

Creating a pipeline for the check

When it came to creating a pipeline for the check, if you had asked me to do this a few months ago, I wouldn’t have known my left from my right. Luckily I had recently attended a great CICD workshop at useR 2024 in Salzburg, led by Daphne Grasselly, Franciszek Walkowiak and Pawel Rucki. You can find the repository from their workshop here – it was invaluable to orient me in the right direction. With a (very naive) google search, I also found this video, which shows how to execute an R script automatically whenever a pull request is made to the main branch of a repo.

Through some trial and error, I was able to coalesce the above resources into quite a short yaml file that set up my CICD pipeline. Within the pharmaverseblog repository, the CICD pipelines live under .github/workflows. There, I added the following new workflow, in the form of a yaml file, called check_post_tags.yml:

name: Check post tags

on:
  pull_request:
    branches:
      - 'main'

jobs:
  Check-post-tags:
    runs-on: ubuntu-latest
    container:
      image: "rocker/tidyverse:4.2.1"
    steps:
      - uses: actions/checkout@v4
      - name: Run check_post_tags
        run: source("R/check_post_tags.R")
        shell: Rscript {0}

It’s deceptively simple to read:

We execute the workflow upon any pull request to main.
When there is a pull request to main, we can load the rocker/tidyverse:4.2.1 docker image which has all the {tidyverse} packages pre-installed.
Then we need to checkout the pharmaverseblog repo and run the check_post_tags.R script.

That’s it! If there are any problematic posts, the script will throw an error and the check will fail like so:

Clicking on the “Details” option will return the error message I constructed previously.

Otherwise, no error will be thrown, the check will pass, and the post is good to go (provided the other checks pass)!

< section id="conclusion" class="level1">

Conclusion

The hardest aspect of the whole process was surpassing the mental barrier of getting started. CICD can be an overwhelming and mysterious topic, but it’s only as complicated as you make it. Of course, I’m still an absolute novice, and maybe my implementation above is not perfect, and there are definitely more complex applications of CICD, but I was still able to write code that solved my specific problem.

If you’d like to read more about CICD within the pharmaverseblog, I’d highly recommend this detailed post by Stefan Thoma on his own blog, where he discusses implementations on the spelling check.

Do you have any use cases for CICD within the pharmaverseblog or any of your own projects? Please reach out – I’d love to hear from you!

< !--------------- appendices go here ----------------->

< section id="last-updated" class="level2 appendix">

Last updated

2025-01-21 14:40:10.832034

< section id="details" class="level2 appendix">

Details

Source, Session info

< section class="quarto-appendix-contents" id="quarto-reuse">

Reuse

< section class="quarto-appendix-contents" id="quarto-citation">

Citation

BibTeX citation:

@online{mancini2024,
  author = {Mancini, Edoardo},
  title = {Writing My First Custom {CICD} Action for the
    Pharmaverseblog},
  date = {2024-09-11},
  url = {https://pharmaverse.github.io/blog/posts/2024-09-11_writing_my_first.../writing_my_first_custom_ci_cd_action_for_the_pharmaverseblog.html},
  langid = {en}
}

For attribution, please cite this work as:

Mancini, Edoardo. 2024. “Writing My First Custom CICD Action for the Pharmaverseblog.” September 11, 2024. https://pharmaverse.github.io/blog/posts/2024-09-11_writing_my_first…/writing_my_first_custom_ci_cd_action_for_the_pharmaverseblog.html.

Related

To leave a comment for the author, please follow the link and comment on their blog: pharmaverse blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Exit mobile version