Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
If you use the same code three times, write a function. If you write three such related functions, set up a package. But if you write three embarrassingly similar functions… write code to generate their code for you? In this post, we’ll deal with source code generation. We’ll differentiate scaffolding from generating code, and we’ll present various strategies observed in the wild.
This post was inspired by an excellent Twitter thread started by Miles McBain, from which we gathered examples. Thank you Miles!
Miles furthermore mentioned Alicia Schep’s rstudio::conf talk “Auto-magic package development” to us, that was a great watch/read!
Introduction
If you can repeat yourself, you’re lucky
When would you need to generate code? A possible use case is wrapping a web API with many, many endpoints that have a predictable structure (parameters, output format) that’s well documented (“API specs”, “API schema”).
In any case, to be able to generate code, you’ll have some sort of underlying data/ontology. Having that data (specs of a web API, of an external tool you’re wrapping, structured list of all your ideas, etc.), and some consistency in the different items, is quite cool, lucky you! Some of us deal with less tidy web APIs. ????
Scope of this post
In this post, we’ll look into scaffolding code (when your output is some sort of skeleton that’s still need some human action before being integrated in a package) and generating code (you hit a button and end up with more functions and docs in the package for its users to find). We won’t look into packages exporting function factories.
Scaffolding code
“There was no way I was writing 146 functions from scratch”. Bob Rudis, GitHub comment.
Even without getting to the dream situation of code being cleanly generated, it can help your workflow to create function skeletons based on data.
The quote by Bob Rudis above refers to his work on
crumpets
where he used the Swagger spec of the Gitea API to generate drafts of many, many functions. The idea was to have following commits edit functions enough to make them work without, as he said, starting from scratch.The experimental
scaffolder
package by Yuan Tang “provides a comprehensive set of tools to automate the process of scaffolding interfaces to modules, classes, functions, and documentations written in other programming languages. As initial proof of concept, scaffolding R interfaces to Python packages is supported via reticulate.”. Thescaffold_py_function_wrapper()
function takes a Python function as input and generates a R script skeleton (R code, and docs, both of them needing further editing).
In these two cases, what’s generated is a template for both R code and the corresponding roxygen2
docs.
Generating code
“odin works using code generation; the nice thing about this approach is that it never gets bored. So if the generated code has lots of tedious repetitive bits, they’re at least likely to be correct (compared with implementing yourself).” Rich FitzJohn, odin README.
Quite convincing, right? But when and how does one generate code for an R package?
Generating code once or once in a while
For the package whose development prompted him to start the Twitter thread mentioned earlier, Miles McBain used code generation. The package creates wrappers around
dplyr
functions, that can in particular automaticallyungroup()
your data. Now say Miles decides to wrap a furtherdplyr
function.- He updates the list of wrappers
- He can then run a make.R script that will source a build.R script that creates R files with actual R code and lines with
roxygen2
code, before runningdevtools::document()
.
Code generating a function
build_fn <- function(fn) { fn_name <- name(fn) glue::glue("{fn_name} <- function(...) {{\n", " dplyr::ungroup(\n", " {fn}(...)\n", " )\n", "}}\n") }
Code generating docs
build_fn_doco <- function(fn) { fn_name <- name(fn) glue::glue( "##' Ungrouping wrapper for {fn_name}\n", "##'\n", "##' The {PKGNAME} package provides a wrapper for {fn_name} that always returns\n", "##' ungrouped data. This avoids mistakes associated with forgetting to call ungroup().\n", "##'\n", "##' For original documentation see [{fn}()].\n", "##'\n", "##' Use [{fn_name}...()] to retain groups as per `{fn}`, whilst\n", "##' signalling this in your code.\n", "##'\n", "##' @title {fn_name}\n", "##' @param ... parameters for {fn}\n", "##' @return an ungrouped dataframe\n", "##' @author Miles McBain\n", "##' @export\n", "##' @seealso {fn}, {fn_name}..." ) }
Voilà, there’s an updated R/
folder, and after running devtools::document()
an updated man/
folder and NAMESPACE
, and it all works.
You’ll have noticed the use of the glue
package, that Alicia Schep also praised in her rstudio::conf talk, and that we’ve seen in many of the examples we’ve collected for this post.
A similar setup is used by Carl Boettiger in
eml.build
for generating functions based on an XML spec. Tweet.A further example is
redux
by Rich FitzJohn where code is generated based on Redis docs. Tweet.xaringanthemer
by Garrick Aden-Buie generates functions based on atibble
containing “Function arguments, doc strings and theme-specific defaults ” that’s also used to generate a docs page. Tweet.
Code generator in a dedicated package
All the examples from the previous subsections had some sort of build scripts living in their package repo.
There’s no convention on what to call them and where to store them.
Now, R developers like their code packaged in package form.
Alicia Schep actually stores a package in the build/
folder of vlbuildr
, vlmetabuildr
, that creates vlbuildr
anew from the Vegalite schema!
That’s indeed!
Fret not, the build/
folder also holds a script called build.R
that unleashes the auto-magic.
Let us mention Alicia’s rstudio::conf talk again.
When to update the package?
We haven’t seen any code generating workflow relying on a Makefile or on a hook to an external source, so we assume such packages are updated once in a while when their maintainer amends, or notices an amendment of, the underlying ontology.
See e.g. the PR updating vlbuildr
to support Vegalite 4.0, or the commit regenerating redis commands for 3.2 in redux
.
Generating code at install time
In the previous cases of code generation, the R package source was similar to many R package sources out there. Now, we’ve also seen cases where the code is generated when installing the package. It means that the code generation has to be perfect, since there isn’t be any human edit between the code generation and the code use. Let’s dive into a few examples.
Generating icon aliases in icon
In icon
, an R package by Mitchell O’Hara-Wild that allows easy insertion of icons from Font Awesome, Academicons and Ionicons into R Markdown, to insert an archive icon one can type icon::fa("archive")
or icon::fa_archive()
, i.e. every possible icon has its own alias function which pairs well with autocompletion e.g. in RStudio when starting to type icon::fa_
.
When typing ?icon::fa_archive
one gets a man page entitled “Font awesome alias”, the same for all aliases.
How does it work?
Font files related to the s are stored in inst/
.
It’s the same for all three s, but let’s focus on what happens for Font Awesome.
In the R code (that’s executed when installing the package), there’s a line reading the icon names from a file.
Further below are a few very interesting lines
#' @evalRd paste("\\keyword{internal}", paste0('\\alias{fa_', gsub('-', '_', fa_iconList), '}'), collapse = '\n') #' @name fa-alias #' @rdname fa-alias #' @exportPattern ^fa_ fa_constructor <- function(...) fa(name = name, ...) for (icon in fa_iconList) { formals(fa_constructor)$name <- icon assign(paste0("fa_", gsub("-", "_", icon)), fa_constructor) } rm(fa_constructor)
When documenting the package, the man page “fa-alias” is created.
The @evalRd
tag ensures aliases for all icons from fa_iconList
get an alias{}
line in the “fa-alias” man page.
The @exportPattern
tag ensures a line exporting all functions whose starts with fa_
is added to NAMESPACE.
This part happens before installing the package, every time the documentation is updated by the package maintainer.
The fa_
functions are created at install time by the for loop.
The function factory fa_constructor
is then removed.
The code generation allows an easy update to new Font Awesome versions, with a very compact source code.
Generating an up-to-date API wrapper in civis
Another interesting example is provided by the civis
package, an R client for the Civis platform.
Its installation instructions state that when installing the package from source, all functions corresponding to the latest API version will be created.
What happens exactly when the package is installed from source?
A configure script is run (configure or configure.win).
Such scripts are automatically run when installing a package from source.
Here’s what this script does: sourcing tools/run_generate_client.R
.
"${R_HOME}"/bin/Rscript tools/run_generate_client.R
And this script fetches the API spec and writes code and roxygen2
docs in R/generated_client.R
.
When the package is not installed from source, the users get the R/generated_client.R
that’s last been generated by the package maintainer, so if the Civis platform itself was updated in the meantime, the users might find a platform endpoint is missing from the civis
package.
The approach used by civis
has the clear advantage of allowing a perfect synchronization between the wrapped platform and the package.
Creating functions lists and R6 methods in minicss
In mimicss
by mikefc, “Lists of CSS property information is turned into function lists and R6 methods.”.
See aaa.R and prop_transform.R.
As in most examples the code is generated as a string, but in that case it’s not written to disk, it becomes code via the use of eval()
and parse()
.
Generate C++ bindings with Rcpp::compileAttributes()
Rcpp::compileAttributes()
generates code (the bindings required to call C++ functions from R) after scanning a package source files. Find more information in the Rcpp
vignette about attributes. You could call the function “whenever functions are added, removed, or have their signatures changed.” but the aforementioned vignette also states “if you are using either RStudio or devtools
to build your package then the compileAttributes
function is called automatically whenever your package is built”.
Generating code on-the-fly
One step further, one might generate code on-the-fly, i.e. as users run the package.
- The
chromote
package “generates auto-completable R6 methods at runtime”, by which Alan Dipert probably referred to these code lines
# Populate methods while the connection is being established. protocol_spec <- jsonlite::fromJSON(self$url("/json/protocol"), simplifyVector = FALSE) self$protocol <- process_protocol(protocol_spec, self$.__enclos_env__) # self$protocol is a list of domains, each of which is a list of # methods. Graft the entries from self$protocol onto self list2env(self$protocol, self)
that are called when creating a chromote
object.
The process_protocol()
function converts the Chrome Devtools Protocol JSON to a list of functions.
- In
stevedore
by Rich FitzJohn, Docker client for R, functions are generated when one connects to the Docker server viastevedore::docker_client()
, selecting the most appropriate version based on the server (possible specs are stored in inst/spec as compressed YAML files). In the author’s own words, in this package the approach is “not going through the text representation at all and using things likeas.function
andcall
/as.call
to build up functions and expressions directly”. This happens in swagger_args.R. Thanks to Rich for many useful comments on this post.
Conclusion
In this post we explored different aspects of source code scaffolding and generation in R packages.
We’ve mentioned examples of code scaffolding (gitea
, scaffolder
), of code generation by a script (wisegroup
, eml.build
, redux
, xaringanthemer
) or by a package (vlbuildr
and vlmetabuildr
) before package shipping, of code generation at install time (icon
, civis
, minicss
, Rcpp::compileAttributes()
) and of code generation at run time (chromote
, stevedore
).
Many of these examples used some form of string manipulation, in base R or with glue
, to either generate an R script and its roxygen2
docs or code using eval()
and parse()
(minicss
).
One of them doesn’t use any text representation, and as.function
and call
/as.call
instead (stevedore
).
icon
also doesn’t write R files.
In the more general context of automatic programming, there are also things called “generative programming”, and “low-code applications” (like tidyblocks?). As much as one enjoys writing R code, it’s great to be able to write less of it sometimes, especially when it gets too routine.
Do you use source code generation in R? Don’t hesitate to add your own use case and setup in the comments below.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.