Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The R
package seplyr
has a neat new feature: the function seplyr::expand_expr()
which implements what we call “the string algebra” or string expression interpolation. The function takes an expression of mixed terms, including: variables referring to names, quoted strings, and general expression terms. It then “de-quotes” all of the variables referring to quoted strings and “dereferences” variables thought to be referring to names. The entire expression is then returned as a single string.
This provides a powerful way to easily work complicated expressions into the seplyr
data manipulation methods.
The method is easiest to see with an example:
library("seplyr")
## Loading required package: wrapr
ratio <- 2 compCol1 <- "Sepal.Width" expr <- expand_expr("Sepal.Length" >= ratio * compCol1) print(expr)
## [1] "Sepal.Length >= ratio * Sepal.Width"
expand_expr
works by capturing the user supplied expression unevaluated, performing some transformations, and returning the entire expression as a single quoted string (essentially returning new source code).
Notice in the above one layer of quoting was removed from "Sepal.Length"
and the name referred to by “compCol1
” was substituted into the expression. “ratio
” was left alone as it was not referring to a string (and hence can not be a name; unbound or free variables are also left alone). So we see that the substitution performed does depend on what values are present in the environment.
If you want to be stricter in your specification, you could add quotes around any symbol you do not want de-referenced. For example:
expand_expr("Sepal.Length" >= "ratio" * compCol1)
## [1] "Sepal.Length >= ratio * Sepal.Width"
After the substitution the returned quoted expression is exactly in the form seplyr
expects. For example:
resCol1 <- "Sepal_Long" datasets::iris %.>% mutate_se(., resCol1 := expr) %.>% head(.)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal_Long ## 1 5.1 3.5 1.4 0.2 setosa FALSE ## 2 4.9 3.0 1.4 0.2 setosa FALSE ## 3 4.7 3.2 1.3 0.2 setosa FALSE ## 4 4.6 3.1 1.5 0.2 setosa FALSE ## 5 5.0 3.6 1.4 0.2 setosa FALSE ## 6 5.4 3.9 1.7 0.4 setosa FALSE
Details on %.>%
(dot pipe) and :=
(named map builder) can be found here and here respectively. The idea is: seplyr::mutate_se(., "Sepal_Long" := "Sepal.Length >= ratio * Sepal.Width")
should be equilant to dplyr::mutate(., Sepal_Long = Sepal.Length >= ratio * Sepal.Width)
.
seplyr
also provides an number of seplyr::*_nse()
convenience forms wrapping all of these steps into one operation. For example:
datasets::iris %.>% mutate_nse(., resCol1 := "Sepal.Length" >= ratio * compCol1) %.>% head(.)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal_Long ## 1 5.1 3.5 1.4 0.2 setosa FALSE ## 2 4.9 3.0 1.4 0.2 setosa FALSE ## 3 4.7 3.2 1.3 0.2 setosa FALSE ## 4 4.6 3.1 1.5 0.2 setosa FALSE ## 5 5.0 3.6 1.4 0.2 setosa FALSE ## 6 5.4 3.9 1.7 0.4 setosa FALSE
To use string literals you merely need one extra layer of quoting:
"is_setosa" := expand_expr(Species == "'setosa'")
## is_setosa ## "Species == \"setosa\""
datasets::iris %.>% transmute_nse(., "is_setosa" := Species == "'setosa'") %.>% summary(.)
## is_setosa ## Mode :logical ## FALSE:100 ## TRUE :50
The purpose of all of the above is to mix names that are known while we are writing the code (these are quoted) with names that may not be known until later (i.e., column names supplied as parameters). This allows the easy creation of useful generic functions such as:
countMatches <- function(data, columnName, targetValue) { # extra quotes to say we are interested in value, not de-reference targetSym <- paste0('"', targetValue, '"') data %.>% transmute_nse(., "match" := columnName == targetSym) %.>% group_by_se(., "match") %.>% summarize_se(., "count" := "n()") } countMatches(datasets::iris, "Species", "setosa")
## # A tibble: 2 x 2 ## match count ## <lgl> <int> ## 1 FALSE 100 ## 2 TRUE 50
The purpose of the seplyr
string system is to pull off quotes and de-reference indirect variables. So, you need to remember to add enough extra quotation marks to prevent this where you do not want it.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.