Site icon R-bloggers

Quasiquotation in R via bquote()

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In August of 2003 Thomas Lumley added bquote() to R 1.8.1. This gave R and R users an explicit Lisp-style quasiquotation capability. bquote() and quasiquotation are actually quite powerful. Professor Thomas Lumley should get, and should continue to receive, a lot of credit and thanks for introducing the concept into R.

In fact bquote() is already powerful enough to build a version of dplyr 0.5.0 with quasiquotation semantics quite close (from a user perspective) to what is now claimed in tidyeval/rlang.

Let’s take a look at that.

For our example problem: suppose we wanted to average a ratio of columns by groups, but want to write code so that what columns are to be used is specified elsewhere (perhaps coming in as function arguments). You would write code such as the following in dplyr 0.7.*:

suppressPackageStartupMessages(library("dplyr"))

# define our parameters
# pretend these come from far away
# or as function arguments.
group_nm <- as.name("am")
num_nm <- as.name("hp")
den_nm <- as.name("cyl")
derived_nm <- as.name(paste0(num_nm, "_per_", den_nm))
mean_nm <- as.name(paste0("mean_", derived_nm))
count_nm <- as.name("group_count")

And then you would execute something like the following. We are not running the following block (so there is no result), as we have dplyr 0.5.0 installed, which does not yet accept the quasiquotation notation.

# apply a parameterized pipeline using rlang::!!
mtcars %>%
  group_by(!!group_nm) %>%
  mutate(!!derived_nm := !!num_nm/!!den_nm) %>%
  summarize(
    !!mean_nm := mean(!!derived_nm),
    !!count_nm := n()
  ) %>%
  ungroup() %>%
  arrange(!!group_nm)

Writing the above code requires two ideas:

dplyr 0.5.0 (released June 2016) did not yet incorporate quasiqotation, so we can not run the above code in dplyr 0.5.0 unless we take the steps of:

This looks like the following.

# apply a partially parameterized pipeline using bquote
eval(bquote(
  
mtcars %>%
  group_by(.(group_nm)) %>%
  mutate(TMP1 = .(num_nm)/.(den_nm)) %>%
  rename_(.dots = setNames("TMP1", as.character(derived_nm))) %>%
  summarize(
    TMP2 = mean(.(derived_nm)),
    TMP3 = n()
  ) %>%
  ungroup() %>%
  rename_(.dots = setNames("TMP2", as.character(mean_nm))) %>%
  rename_(.dots = setNames("TMP3", as.character(count_nm))) %>%
  arrange(.(group_nm))

))
## # A tibble: 2 x 3
##      am mean_hp_per_cyl group_count
##   <dbl>           <dbl>       <int>
## 1     0            22.7          19
## 2     1            23.4          13

That nearly solved the problem (notice we were not able to control the left-hand sides of assignment-like expressions). In December of 2016 we publicly announced and released the let() adapter that addressed almost all of these concerns at the user level (deliberately not modifying dplyr code). The substitutions are specified by name (instead of by position), and the code must still be executed in a let() block to get the desired control.

This looks like the following.

alias <- 
  c(group_nm = as.character(group_nm),
    num_nm = as.character(num_nm),
    den_nm  = as.character(den_nm),
    derived_nm  = as.character(derived_nm),
    mean_nm = as.character(mean_nm),
    count_nm  = as.character(count_nm))

wrapr::let(alias,
    
mtcars %>%
  group_by(group_nm) %>%
  mutate(derived_nm = num_nm/den_nm) %>%
  summarize(
    mean_nm = mean(derived_nm),
    count_nm = n()
  ) %>%
  ungroup() %>%
  arrange(group_nm)

)
## # A tibble: 2 x 3
##      am mean_hp_per_cyl group_count
##   <dbl>           <dbl>       <int>
## 1     0            22.7          19
## 2     1            23.4          13

We felt this was the least intrusive way to give dplyr users useful macro substitution ability over dplyr pipelines. So by December of 2016 users had public access to a complete solution. More on wrapr::let() can be found here.

In May of 2017 rlang was publicly announced and released. dplyr adapted new rlang based quasiquotation ability directly into many of its functions/methods. This means the user does not need to draw a substitution block around their code, as the package authors have written the block. At first glance this is something only the dplyr package authors could do (as they control the code), but it turns out one can also easily adapt (or wrap) existing code in R.

A very small effort (about 25 lines of code now found in bquote_call()) is needed to specify argument substitution semantics using bquote(). The bulk of that code is adding the top-level := to = substitution. Then it takes only about 10 lines of code to write a function that re-maps existing dplyr functions to new functions using the bquote() adapter bquote_function(). And a package author could choose to directly integrate bquote() transforms at even smaller cost.

Lets convert dplyr 0.5.0 to bquote() based quasiquotation. We can do that by running our adapter over the names of 29 common dplyr methods, which places bquote() enhanced versions in our running environment. The code to do this is below.

# wrap some dplyr methods
nms <- c("all_equal", "anti_join", "arrange",
         "as.tbl", "bind_cols", "bind_rows",
         "collect", "compute", "copy_to", "count",
         "distinct", "filter", "full_join",
         "group_by", "group_indices",
         "inner_join", "is.tbl", "left_join",
         "mutate", "rename", "right_join",
         "select", "semi_join", "summarise",
         "summarize", "tally", "tbl", "transmute",
         "ungroup")
env <- environment()
for(fname in nms) {
  fn <- getFromNamespace(fname, ns = "dplyr")
  assign(fname, 
         # requires wrapr 1.6.4 or newer
         wrapr::bquote_function(fn),
         envir = env)
}

With these adaptions in place we can write partially substituted or parametric code as follows.

packageVersion("dplyr")
## [1] '0.5.0'
# apply a parameterized pipeline using bquote
mtcars %>%
  group_by(.(group_nm)) %>%
  mutate(.(derived_nm) := .(num_nm)/.(den_nm)) %>%
  summarize(
    .(mean_nm) := mean(.(derived_nm)),
    .(count_nm) := n()
  ) %>%
  ungroup() %>%
  arrange(.(group_nm))
## # A tibble: 2 x 3
##      am mean_hp_per_cyl group_count
##   <dbl>           <dbl>       <int>
## 1     0            22.7          19
## 2     1            23.4          13

This pretty much demonstrates that bquote() was up to the job the whole time. All one had to do is decide to incorporate it into the source code (something only the package author could do) or adapt the functions (a bit invasive, but something any user can do).

The user visible difference is that bquote() uses .(x) to mark substitution of x. rlang currently uses !!x to mark substitution of x (syntactically reminiscent of Lisp’s back-tick). Earlier versions of rlang used UQ(x), uq(x), and even ((x)) (signs of this can be found here). Other famous substitution notations include %x (from C/printf), {} (from Python PEP 498 — Literal String Interpolation, and used in glue.), $ and ${} (from bash), and many more.

There are some minor technical differences between the bquote() and rlang solutions. But one can write corner case code that fails either method at will. We are not discussing the rlang !!! "splice" syntax, as the need for it is avoidable by avoiding over-use of "..." tricks. For actual user work both systems are going to look very similar.

Overall we are discussing avoiding interfaces that are convenient for interactive work, yet difficult to parameterize or program over. Quasiquotation is a tool to overcome package design limitations or design flaws. You don’t see quasiquotation very much in R, as many R developers work hard on their end to make sure the user does not require it.

For R package and function developers my advice is:

As a developer, you should always provide an escape hatch: an alternative version of the function that uses standard evaluation.

A lot of the technical details (including defining standard and non-standard evaluation) are covered in our long article on macro capabilities in R.

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.