Dplyr functions with string
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Let’s say we have a simple data frame as below and we want to select the female rows only.
df <- data.frame(id = c(1, 2, 3, 4, 5), gender = c("male", "female", "male", "female", "female")) df ## id gender ## 1 1 male ## 2 2 female ## 3 3 male ## 4 4 female ## 5 5 female library(dplyr) df %>% filter(., gender == "female") ## id gender ## 1 2 female ## 2 4 female ## 3 5 female
The filter()
function in dplyr
(and other similar functions from the
package) use something called non-standard evaluation (NSE). In NSE,
names are treated as string literals. So just using ‘gender’ (without quotes) in the
function above works fine. This is in contrast with functions using
standard evaluation (SE). For example, the following code of indexing column
in a data frame will give an error.
df[, gender] # this will give error ## Error in `[.data.frame`(df, , gender): object 'gender' not found
This is because data frame indexing with []
uses SE, in which names
are treated as references to values. To make this work we will need to
pass a string.
df[, "gender"] # this works ## [1] male female male female female ## Levels: female male
This behaviour of dplyr
is usually quite helpful as it makes the
expression succinct. However, occasionally we want to pass a string to
the function, for example, if we use the filter()
function in a loop
or part of a more complicated function. Because of NSE, the codes below
will not work. Note that it doesn’t throw an error but give 0 rows. This
is because the function takes ‘var’ literally and couldn’t find it in
the data frame!
var <- "gender" # this is a string val <- "female" # this is a string df %>% filter(., var == val) ## [1] id gender ## <0 rows> (or 0-length row.names)
To make this work, we will need a trick like below. The sym()
function
convert a string to symbol, contrarily to as.name()
. Then use the
!!
to say that you want to unquote an input so that it’s evaluated, not
quoted.
var <- "gender" # this is a string val <- "female" # this is a string df %>% filter(., !!sym(var) == val) ## id gender ## 1 2 female ## 2 4 female ## 3 5 female
Alternatively, you can use the get()
function, which returns the value
of a named object.
var <- "gender" # this is a string val <- "female" # this is a string df %>% filter(., get(var) == val) ## id gender ## 1 2 female ## 2 4 female ## 3 5 female
The first method addresses the NSE of the filter()
function while the
second method tricks it to get the job done. Both work just fine.
If we don’t want to pass a string but a name instead, the tidyverse
has recently introduced a {{}}
(#curly-curly’) operator for tidy evaluation.
library(tidyverse) squirrels <- read_csv(str_c( "https://raw.githubusercontent.com/", "rfordatascience/tidytuesday/master/", "data/2019/2019-10-29/nyc_squirrels.csv")) count_groups <- function(df, groupvar){ df %>% group_by({{ groupvar }}) %>% count() } count_groups(squirrels, climbing)
See now we can pass the variable name climbing
to the group_by
function using the {{}}
operator. In the past, we have to use the more cumbersome !! enquo
(quote-unquote) trick to achieve somthing similar.
count_groups_old <- function(df, groupvar){ df %>% group_by(!! enquo(groupvar)) %>% count() } count_groups_old(squirrels, climbing)
Happy hacking!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.