Short note about tidyeval
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Following Jenny Bryan’s talk on tidyeval in the last rstudio::conf 2019, I decided to write this short note (mainly as a reminder to myself).
What is tidyeval?
Tidy evaluation, or non standard evaluation, allows us to pass column names between functions. This is the “classic” behaviour of most tidyverse functions. For example, we use:
library(tidyverse) mtcars %>% select(mpg, cyl) ## mpg cyl ## Mazda RX4 21.0 6 ## Mazda RX4 Wag 21.0 6 ## Datsun 710 22.8 4 ## Hornet 4 Drive 21.4 6 ## Hornet Sportabout 18.7 8 ## Valiant 18.1 6 ## Duster 360 14.3 8 ## Merc 240D 24.4 4 ## Merc 230 22.8 4 ## Merc 280 19.2 6 ## Merc 280C 17.8 6 ## Merc 450SE 16.4 8 ## Merc 450SL 17.3 8 ## Merc 450SLC 15.2 8 ## Cadillac Fleetwood 10.4 8 ## Lincoln Continental 10.4 8 ## Chrysler Imperial 14.7 8 ## Fiat 128 32.4 4 ## Honda Civic 30.4 4 ## Toyota Corolla 33.9 4 ## Toyota Corona 21.5 4 ## Dodge Challenger 15.5 8 ## AMC Javelin 15.2 8 ## Camaro Z28 13.3 8 ## Pontiac Firebird 19.2 8 ## Fiat X1-9 27.3 4 ## Porsche 914-2 26.0 4 ## Lotus Europa 30.4 4 ## Ford Pantera L 15.8 8 ## Ferrari Dino 19.7 6 ## Maserati Bora 15.0 8 ## Volvo 142E 21.4 4
The two variables were selected out of the mtcars
data set, and we specified them as names without using any quotation marks. They are symbolic, not characters (although they could also be specified as characters, select
is smart enough that way).
But assume we want to pass variables “tidy style” between functions which do different operations.
Variation one – a basic example
We’ll start simple: a function which has two parameters. The first parameter is a dataset. The second parameters is a grouping variable. All other variables in the data set will have their mean computed using summarize_all
.
test1 <- function(dataset, groupby_vars){ grouping_vars <- enquo(groupby_vars) dataset %>% group_by(!! grouping_vars) %>% summarize_all(funs(mean(.))) %>% return() } mtcars %>% select(cyl:carb) %>% test1(groupby_vars = cyl) ## # A tibble: 3 x 10 ## cyl disp hp drat wt qsec vs am gear carb ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 4 105. 82.6 4.07 2.29 19.1 0.909 0.727 4.09 1.55 ## 2 6 183. 122. 3.59 3.12 18.0 0.571 0.429 3.86 3.43 ## 3 8 353. 209. 3.23 4.00 16.8 0 0.143 3.29 3.5
We can see that mtcars
was grouped by cyl
which was passed as a name (not characters). The function test1
took it, then enquo()
-ed it, and eventually used it in the tidy chain using !!
.
The function enquo
turns the input into a “quosure”. Then the !!
“uses” the quosure to select the proper variable from mtcars.
Passing arguments using ...
A slightly more complex situation is passing multiple arguments to the function. Assume that this time we want to construct a function which gets one input by which to group by, and what are the variables to be summarized:
test2 <- function(dataset, groupby_vars, ...){ grouping_vars <- enquo(groupby_vars) dataset %>% group_by(!! grouping_vars) %>% summarize_at(vars(...), funs(mean(.))) %>% return() } mtcars %>% select(cyl:carb) %>% test2(groupby_vars = cyl, disp:drat) ## # A tibble: 3 x 4 ## cyl disp hp drat ## <dbl> <dbl> <dbl> <dbl> ## 1 4 105. 82.6 4.07 ## 2 6 183. 122. 3.59 ## 3 8 353. 209. 3.23
What happend is that test2
treats the grouping variable the same way that test1
treated it, but it also passed along the variables disp:drat
.
Maximum flexibility – multiple enquo()
s
Sometime passing the dots, i.e., ...
is not enough.
For example, if we want specify behaviour for different columns of the data frame (e.g., compute the mean for some and the std for others). In such cases we need a more flexible version. We can extend the flexibilty of this approach using multiple enqou()
s.
test3 <- function(dataset, groupby_vars, computemean_vars, computestd_vars){ grouping_vars <- enquo(groupby_vars) mean_vars <- enquo(computemean_vars) std_vars <- enquo(computestd_vars) dataset %>% group_by(!! grouping_vars) %>% summarize_at(vars(!!mean_vars), funs(mean(.))) %>% left_join(dataset %>% group_by(!! grouping_vars) %>% summarize_at(vars(!!std_vars), funs(sd(.)))) } mtcars %>% test3(groupby_vars = cyl, disp:drat, wt:carb) ## Joining, by = "cyl" ## # A tibble: 3 x 10 ## cyl disp hp drat wt qsec vs am gear carb ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 4 105. 82.6 4.07 0.570 1.68 0.302 0.467 0.539 0.522 ## 2 6 183. 122. 3.59 0.356 1.71 0.535 0.535 0.690 1.81 ## 3 8 353. 209. 3.23 0.759 1.20 0 0.363 0.726 1.56
In the resulting table, the first column cyl
is the grouping variable, columns disp
through drat
have the mean of the corresponding variables, and columns wt
through carb
have their standard deviation computed.
Additional uses of tidy evaluation
This evaluation is very useful when building flexible functions, but also when using the ggplot2
syntax within functions, and more so when using Shiny applications, in which input parameters need to go in as grouping or as plotting parameters.
However, this is a topic for a different post.
Conclusions
Tidy evaluation empowers you with great tools – it offers a great degree of flexibilty, but it’s a bit tricky to master.
My suggestion is that if you’re trying to master tidy evaluation, just think about your use case: which of the three variations presented in this post it resembles too?
Work your way up – from the simplest version (if it works for you) and up to the complex (but most flexible) version.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.