Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In this article we will discuss composing standard-evaluation interfaces (SE) and composing non-standard-evaluation interfaces (NSE) in R
.
In R
the package tidyeval
/rlang
is a tool for building domain specific languages intended to allow easier composition of NSE interfaces.
To use it you must know some of its structure and notation. Here are some details paraphrased from the major tidyeval
/rlang
client, the package dplyr: vignette('programming', package = 'dplyr')
).
- "
:=
" is needed to make left-hand-side re-mapping possible (adding yet another "more than one assignment type operator running around" notation issue). - "
!!
" substitution requires parenthesis to safely bind (so the notation is actually "(!! )
", not "!!
"). - Left-hand-sides of expressions are names or strings, while right-hand-sides are
quosures
/expressions.
Example
Let’s apply tidyeval
/rlang
notation to the task of building re-usable generic in R
.
# setup suppressPackageStartupMessages(library("dplyr")) packageVersion("dplyr")
## [1] '0.7.0'
vignette('programming', package = 'dplyr')
includes the following example:
my_mutate <- function(df, expr) { expr <- enquo(expr) mean_name <- paste0("mean_", quo_name(expr)) sum_name <- paste0("sum_", quo_name(expr)) mutate(df, !!mean_name := mean(!!expr), !!sum_name := sum(!!expr) ) }
We can try this:
d <- data.frame(a=1) my_mutate(d, a)
## a mean_a sum_a ## 1 1 1 1
SE Example
From this example we can figure out how to use tidyeval
/rlang
notation to build a standard interface version of a function that adds one to a column and lands the value in an arbitrary column:
tidy_add_one_se <- function(df, res_var_name, input_var_name) { input_var <- as.name(input_var_name) res_var <- res_var_name mutate(df, !!res_var := (!!input_var) + 1) } tidy_add_one_se(d, 'res', 'a')
## a res ## 1 1 2
And we can re-wrap tidy_add_one_se
as into a "add one to self" function as we show here:
tidy_increment_se <- function(df, var_name) { tidy_add_one_se(df, var_name, var_name) } tidy_increment_se(d, 'a')
## a ## 1 2
NSE Example
We can also use the tidyeval
/rlang
notation more as it is intended: to wrap or compose a non-standard interface in another non-standard interface.
tidy_add_one_nse <- function(df, res_var, input_var) { input_var <- enquo(input_var) res_var <- quo_name(enquo(res_var)) mutate(df, !!res_var := (!!input_var) + 1) } tidy_add_one_nse(d, res, a)
## a res ## 1 1 2
And we even wrap this again as a new "add one to self" function:
tidy_increment_nse <- function(df, var) { var <- enquo(var) tidy_add_one_nse(df, !!var, !!var) } tidy_increment_nse(d, a)
## a ## 1 2
(The above enquo()
then "!!
" pattern is pretty much necissary, as the simpler idea of just passing var
through doesn’t work.)
An Issue
We could try use base::substitute()
instead of quo_name(enquo())
in the non-standard-evaluation wrapper. At first this appears to work, but it runs into trouble when we try to compose non-standard-evaluation functions with each other.
tidy_add_one_nse_subs <- function(df, res_var, input_var) { input_var <- enquo(input_var) res_var <- substitute(res_var) mutate(df, !!res_var := (!!input_var) + 1) } tidy_add_one_nse_subs(d, res, a)
## a res ## 1 1 2
However this seemingly similar variation is not re-composable in the same manner.
tidy_increment_nse_subs <- function(df, var) { var <- enquo(var) tidy_add_one_nse_subs(df, !!var, !!var) } tidy_increment_nse_subs(d, a)
## Error: LHS must be a name or string
Likely there is some way to get this to work, but my point is:
- The obvious way didn’t work.
- Some NSE functions can’t be re-used in standard NSE composition. You may not know which ones those are ahead of time. Presumably functions from major packages are so-vetted, but you may not be able to trust "one off compositions" to be safe to re-compose.
wrapr::let
It is easy to specify the function we want with wrapr
as follows (both using standard evaluation, and using non-standard evaluation):
SE version
library("wrapr") wrapr_add_one_se <- function(df, res_var_name, input_var_name) { wrapr::let( c(RESVAR= res_var_name, INPUTVAR= input_var_name), df %>% mutate(RESVAR = INPUTVAR + 1) ) } wrapr_add_one_se(d, 'res', 'a')
## a res ## 1 1 2
Standard composition:
wrapr_increment_se <- function(df, var_name) { wrapr_add_one_se(df, var_name, var_name) } wrapr_increment_se(d, 'a')
## a ## 1 2
NSE version
Non-standard evaluation interface:
wrapr_add_one_nse <- function(df, res_var, input_var) { wrapr::let( c(RESVAR= substitute(res_var), INPUTVAR= substitute(input_var)), df %>% mutate(RESVAR = INPUTVAR + 1) ) } wrapr_add_one_nse(d, res, a)
## a res ## 1 1 2
wrapr::let()
‘s NSE composition pattern seems to work even when applied to itself:
wrapr_increment_nse <- function(df, var) { wrapr::let( c(VAR= substitute(var)), wrapr_add_one_nse(df, VAR, VAR) ) } wrapr_increment_nse(d, a)
## a ## 1 2
Abstract Syntax Tree Version
Or, if you are uncomfortable with macros being implemented through string-substitution one can use wrapr::let()
in "language mode" (where it works directly on abstract syntax trees).
SE re-do
wrapr_add_one_se <- function(df, res_var_name, input_var_name) { wrapr::let( c(RESVAR= res_var_name, INPUTVAR= input_var_name), df %>% mutate(RESVAR = INPUTVAR + 1), subsMethod= 'langsubs' ) } wrapr_add_one_se(d, 'res', 'a')
## a res ## 1 1 2
wrapr_increment_se <- function(df, var_name) { wrapr_add_one_se(df, var_name, var_name) } wrapr_increment_se(d, 'a')
## a ## 1 2
NSE re-do
wrapr_add_one_nse <- function(df, res_var, input_var) { wrapr::let( c(RESVAR= substitute(res_var), INPUTVAR= substitute(input_var)), df %>% mutate(RESVAR = INPUTVAR + 1), subsMethod= 'langsubs' ) } wrapr_add_one_nse(d, res, a)
## a res ## 1 1 2
wrapr_increment_nse <- function(df, var) { wrapr::let( c(VAR= substitute(var)), wrapr_add_one_nse(df, VAR, VAR), subsMethod= 'langsubs' ) } wrapr_increment_nse(d, a)
## a ## 1 2
Conclusion
tidyeval
/rlang
provides general tools to compose or daisy-chain non-standard-evaluation functions (i.e., write new non-standard-evaluation functions in terms of others. This tries to abrogate the issue that it can be hard to compose non-standard function interfaces (i.e., one can not parameterize them or program over them without a tool such as tidyeval
/rlang
). In contrast wrapr::let()
concentrates on standard evaluation, providing a tool that allows one to re-wrap non-standard-evaluation interfaces as standard evaluation interfaces.
A lot of the tidyeval
/rlang
design is centered on treating variable names as lexical closures that capture an environment they should be evaluated in. This does make them more like general R
functions (which also have this behavior).
However, creating so many long-term bindings is actually counter to some common data analyst practice.
The my_mutate(df, expr)
example itself from vignette('programming', package = 'dplyr')
even shows the pattern I am referring to: the analyst transiently pairs a chosen concrete data set to abstract variable names. One argument is the data and the other is the expression to be applied to that data (and only that data, with clean code not capturing values from environments).
Many methods are written expecting to be re-run on different data (for example predict()
). This has the huge advantage that it documents your intent to change out what data is being applied (such as running a procedure twice, once on training data and once on future application data).
This is a principle we also strongly apply in our join controller which has no issue sharing variables out as an external spreadsheet, because it thinks of variable names (here meaning columns names) as fundamentally being strings (not as quosures
temporally working "under cover" in string representations).
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.