Site icon R-bloggers

Dplyr functions with string

[This article was first published on R | databentobox, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Let’s say we have a simple data frame as below and we want to select the female rows only.

df <- data.frame(id = c(1, 2, 3, 4, 5),
gender = c("male", "female", "male", "female", "female"))
df

## id gender
## 1 1 male
## 2 2 female
## 3 3 male
## 4 4 female
## 5 5 female

library(dplyr)
df %>%
filter(., gender == "female")

## id gender
## 1 2 female
## 2 4 female
## 3 5 female

The filter() function in dplyr (and other similar functions from the package) use something called non-standard evaluation (NSE). In NSE, names are treated as string literals. So just using ‘gender’ (without quotes) in the function above works fine. This is in contrast with functions using standard evaluation (SE). For example, the following code of indexing column in a data frame will give an error.

df[, gender] # this will give error

## Error in `[.data.frame`(df, , gender): object 'gender' not found

This is because data frame indexing with [] uses SE, in which names are treated as references to values. To make this work we will need to pass a string.

df[, "gender"] # this works

## [1] male female male female female
## Levels: female male

This behaviour of dplyr is usually quite helpful as it makes the expression succinct. However, occasionally we want to pass a string to the function, for example, if we use the filter() function in a loop or part of a more complicated function. Because of NSE, the codes below will not work. Note that it doesn’t throw an error but give 0 rows. This is because the function takes ‘var’ literally and couldn’t find it in the data frame!

var <- "gender" # this is a string
val <- "female" # this is a string
df %>%
filter(., var == val)

## [1] id gender
## <0 rows> (or 0-length row.names)

To make this work, we will need a trick like below. The sym() function convert a string to symbol, contrarily to as.name(). Then use the !! to say that you want to unquote an input so that it’s evaluated, not quoted.

var <- "gender" # this is a string
val <- "female" # this is a string
df %>%
filter(., !!sym(var) == val)

## id gender
## 1 2 female
## 2 4 female
## 3 5 female

Alternatively, you can use the get() function, which returns the value of a named object.

var <- "gender" # this is a string
val <- "female" # this is a string
df %>%
filter(., get(var) == val)

## id gender
## 1 2 female
## 2 4 female
## 3 5 female

The first method addresses the NSE of the filter() function while the second method tricks it to get the job done. Both work just fine.

If we don’t want to pass a string but a name instead, the tidyverse has recently introduced a {{}} (#curly-curly’) operator for tidy evaluation.

library(tidyverse)
squirrels <- read_csv(str_c(
"https://raw.githubusercontent.com/",
"rfordatascience/tidytuesday/master/",
"data/2019/2019-10-29/nyc_squirrels.csv"))
count_groups <- function(df, groupvar){
df %>%
group_by({{ groupvar }}) %>%
count()
}
count_groups(squirrels, climbing)

See now we can pass the variable name climbing to the group_by function using the {{}} operator. In the past, we have to use the more cumbersome !! enquo (quote-unquote) trick to achieve somthing similar.

count_groups_old <- function(df, groupvar){
df %>%
group_by(!! enquo(groupvar)) %>%
count()
}
count_groups_old(squirrels, climbing)

Happy hacking!

To leave a comment for the author, please follow the link and comment on their blog: R | databentobox.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.