Programming over R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
R
is a very fluid language amenable to meta-programming, or alterations of the language itself. This has allowed the late user-driven introduction of a number of powerful features such as magrittr pipes, the foreach system, futures, data.table, and dplyr. Please read on for some small meta-programming effects we have been experimenting with.
Meta-Programming
Meta-programming is a powerful tool that allows one to re-shape a programming language or write programs that automate parts of working with a programming language.
Meta-programming itself has the central contradiction that one hopes nobody else is doing meta-programming, but that they are instead dutifully writing referentially transparent code that is safe to perform transformations over, so that one can safely introduce their own clever meta-programming. For example: one would hate to lose the ability to use a powerful package such as future because we already “used up all the referential transparency” for some minor notational effect or convenience.
That being said, R
is an open system and it is fun to play with the notation. I have been experimenting with different notations for programming over R
for a while, and thought I would demonstrate a few of them here.
Let Blocks
We have been using let
to code over non-standard evaluation (NSE) packages in R
for a while now. This allows code such as the following:
library("dplyr") library("wrapr") d <- data.frame(x = c(1, NA)) cname <- 'x' rname <- paste(cname, 'isNA', sep = '_') let(list(COL = cname, RES = rname), d %>% mutate(RES = is.na(COL)) ) # x x_isNA # 1 1 FALSE # 2 NA TRUE
let
is in fact quite handy notation that will work in a non-deprecated manner with both dplyr 0.5
and dplyr 0.6
. It is how we are future-proofing our current dplyr
workflows.
Unquoting
dplyr 0.6
is introducing a new execution system (alternately called rlang
or tidyeval
, see here) which uses a notation more like the following (but fewer parenthesis, and with the ability to control left-hand side of an in-argument assignment):
beval(d %>% mutate(x_isNA = is.na((!!cname))))
The inability to re-map the right-hand side of the apparent assignment is because the “(!! )
” notation doesn’t successfully masquerade as a lexical token valid on the left-hand side of assignments or function argument bindings.
And there was an R language proposal for a notation like the following (but without the quotes, and with some care to keep it syntactically distinct from other uses of “@”):
ateval('d %>% mutate(@rname = is.na(@cname))')
beval
and ateval
are just curiosities implemented to try and get a taste of the new dplyr
notation, and we don’t recommend using them in production — their ad-hoc demonstration implementations are just not powerful enough to supply a uniform interface. dplyr
itself seems to be replacing a lot of R
‘s execution framework to achieve stronger effects.
Write Arrow
We are experimenting with “write arrow” (a deliberate homophone of “right arrow”). It allows the convenient storing of a pipe result into a variable chosen by name.
library("dplyr") library("replyr") 'x' -> whereToStoreResult 7 %>% sin %>% cos %->_% whereToStoreResult print(x) ## [1] 0.7918362
Notice, the value “7” is stored in the variable “x” not in a variable named “whereToStoreResult”. “whereToStoreResult” was able to name where to store the value parametrically.
This allows code such as the following:
for(i in 1:3) { i %->_% paste0('x',i) }
(Please run the above to see the automatic creation of variables named “x1”, “x2”, and “x3”, storing values 1,2, and 3 respectively.)
We know left to right assignment is heterodox; but the notation is very slick if you are consistent with it, and add in some formatting rules (such as insisting on a line break after each pipe stage).
Conclusion
One wants to use meta-programming with care. In addition to bringing in desired convenience it can have unexpected effects and interactions deeper in a language or when exposed to other meta-programming systems. This is one reason why a “seemingly harmless” proposal such as “user defined unary functions” or “at unquoting” takes so long to consider. This is also why new language features are best tried in small packages first (so users can easily chose to include them or not in their larger workflow) to drive public request for comments (RFC) processes or allow the ideas to evolve (and not be frozen at their first good idea, a great example of community accepted change being Haskel’s switch from request chaining IO to monadic IO; the first IO system “seemed inevitable” until it was completely replaced).
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.