Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Another R tip. Need to replace a name in some R code or make R code re-usable? Use wrapr::let()
.
Here is an example involving dplyr
.
Let’s look at some example data:
library("dplyr") library("wrapr") starwars %>% select(., name, homeworld, species) %>% head(.) # # A tibble: 6 x 3 # name homeworld species # <chr> <chr> <chr> # 1 Luke Skywalker Tatooine Human # 2 C-3PO Tatooine Droid # 3 R2-D2 Naboo Droid # 4 Darth Vader Tatooine Human # 5 Leia Organa Alderaan Human # 6 Owen Lars Tatooine Human
For “%>%
/.
” please see R Tip: Make Arguments Explicit in magrittr
/dplyr
Pipelines. Also, though we will not use it here, we feel separating argument types (data versus columns) in select()
is much more comprehensible and made easy using qc()
notation such as “select(., qc(name, homeworld, species))
“.
Now let’s change the name of one column. The challenge will be: the name of the old column and the new name will not be known at the time of writing the code (a common problem when writing re-usable functions or code).
Suppose the remapping is specified in variables, as below.
newname <- "genus" oldname <- "species"
We could prepare to work with column names as values using
wrapr::let()
as we show here.
let( alias = c(NEWNAME = newname, OLDNAME = oldname), starwars %>% rename(., NEWNAME = OLDNAME) %>% select(., name, homeworld, NEWNAME) %>% head(.) ) # name homeworld genus # <chr> <chr> <chr> # 1 Luke Skywalker Tatooine Human # 2 C-3PO Tatooine Droid # 3 R2-D2 Naboo Droid # 4 Darth Vader Tatooine Human # 5 Leia Organa Alderaan Human # 6 Owen Lars Tatooine Human
The merit of the above notation is the exact new names "species"
and "genus"
may come from variables, and do not need to be known to the programmer writing the let()
-block. There are other methods to attempt such substitution (which were actually publicly pre-announced only after let()
had already been publicly announced and in CRAN
distribution; so let()
is in fact known prior art despite apparently not being cited). In our experience (and opinion) wrapr::let()
is by far the most legible, teachable, and reliable code-rewriting (or meta-programming) tool for this task in R. It is a good choice for part time R users and we are working on formal documentation for expert users.
Another alternative is to use the seplyr
package, which wraps dplyr
operators into more standard value oriented notation. The above example in seplyr
is as follows.
library("seplyr") starwars %>% rename_se(., newname := oldname) %>% select_se(., c("name", "homeworld", newname)) %>% head(.)
Let’s finish with an example from the dplyr 0.7.0 announcement. The following is code from that announcement:
my_var <- "homeworld" starwars %>% group_by(.data[[my_var]]) %>% summarise_at(vars(height:mass), mean, na.rm = TRUE) # # A tibble: 49 x 3 # my_var height mass # <chr> <dbl> <dbl> # 1 Alderaan 176. 64.0 # 2 Aleen Minor 79.0 15.0 #...
Notice the grouping column is incorrectly named as “my_var
” (some other places this was noticed: 1, 2, 3). This is not harmless, as code attempting to refer to the original name will fail. The above is possibly not the current preferred rlang notation, which has been iterating through “!!
” and “UQ()
” (though I think UQ()
is already “soft deprecated”). My theory is the correct form may be the even more cumbersome “.data[[!!my_var]]
” even though this is not being commonly taught. However, even if the original code is indeed “malformed rlang/dplyr” (that is: outside the intended variations of the grammar), notice: that it was not caught or signaled. And at least at some point recently the shorter notation was being taught by the package authors. So it is hard to consider the rlang notation and teaching quite settled.
The equivalent let()
notation is easy and works correctly.
let( c(MY_VAR = my_var), starwars %>% group_by(MY_VAR) %>% summarise_at(vars(height:mass), mean, na.rm = TRUE) ) # # A tibble: 49 x 3 # homeworld height mass # <chr> <dbl> <dbl> # 1 Alderaan 176. 64.0 # 2 Aleen Minor 79.0 15.0
The seplyr
equivalent is the following:
starwars %>% group_by_se(., my_var) %>% summarise_at(vars(height:mass), mean, na.rm = TRUE)
If you absolutely must have “data pronouns” (such as the “.data
” notation), those are actually fairly easy to add to classic base-R pipe enhanced functions. Though we feel most R users avoid need of such pronouns through proper use of common R structured environment nesting conventions (just as many programmers do not feel the need for a “goto” statement when they stick to structured coding conventions).
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.