Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I have written about referential transparency before. In this article I would like to discuss “leaky abstractions” and why wrapr::let()
supplies a useful (but leaky) abstraction for R
programmers.
Abstractions
A common definition of an abstraction is (from the OSX
dictionary):
the process of considering something independently of its associations, attributes, or concrete accompaniments.
In computer science this is commonly taken to mean “what something can be thought to do independent of caveats and implementation details.”
The magrittr
abstraction
In R
one traditionally thinks of the magrittr "%>%"
pipe abstractly in the following way:
Once "library(magrittr)" is loaded we can treat the expression: 7 %>% sqrt() as if the programmer had written: sqrt(7) .
That is the abstraction of magrittr
into terms one can reason about and plan over. You think of x %>% f()
as a synonym for f(x)
. This is an abstraction because magrittr
is not in fact implemented as a macro source-code re-write, but in in terms of function argument capture and delayed evaluation. And as Joel Spolsky famously wrote:
All non-trivial abstractions, to some degree, are leaky.
The magrittr
pipe is non-trivial (in the sense of doing interesting work) because it works as if it were a syntax replacement even though you can use it more places than you could ask for such a syntax replacement. The upside is: magrittr
makes two statements behave nearly equivalently. The downside is: we expect this to fail in some corner cases. This is not a criticism; it is as Bjarne Stroustrup wrote:
There are only two kinds of languages: the ones people complain about and the ones nobody uses.
The tidyeval
/rlang
abstraction
The package dplyr 0.6.*
brings in a new package called rlang
to supply a capability called tidyeval
. Among the abstractions it supplies are: operators for quoting and un-quoting variable names. This allows code like the following, where a dplyr::select()
takes a variable name from a user supplied variable (instead of the usual explicit take from the text of the dplyr::select()
statement).
library("dplyr") packageVersion("dplyr") # [1] ‘0.5.0.9004’ varName = quo(disp) mtcars %>% select(!!varName) %>% head() # disp # Mazda RX4 160 # Mazda RX4 Wag 160 # Datsun 710 108 # Hornet 4 Drive 258 # Hornet Sportabout 360 # Valiant 225
Notice in the above example we had to specify the abstract varName
by calling quo()
on a free variable name (disp
) and did not take the value from a string. tidyeval
is working hard to supply a parametrizable non-standard interface, and it doesn’t look like a standard interface is the central goal. That is: the following is not intended to work:
varName <- quo(colnames(mtcars)[[1]]) mtcars %>% select(!!varName) %>% head() # Error: colnames(mtcars)[[1]]: must resolve to integer column positions, not string
This is unfortunate as the main reason you want to parameterize over variable names is that the names are coming from somewhere else, and likely supplied as strings not as quosure
s (which themselves carry details of environment, meaning they are more like bound variables than free variables). I am sure you can convert a string into a column reference in rlang/tidyeval
but it doesn’t seem to be the central use case (or is least not held out as such in the help and examples).
The wrapr::let()
abstraction
Our wrapr
package can abstract the recent example (working over strings instead of “quosure
” classes) as follows.
The (leaky) abstraction is:
“
varName <- 'var'; wrapr::let(VAR=varName, expr(VAR))
” is treated as if the user had writtenexpr(var)
.
This can be also thought of as form of unquoting as you do see one set of quotes disappear.
Let’s try it:
library("wrapr") x <- 5 varName <- 'x' let(c(VAR=varName), VAR) # [1] 5
Or moving back to our dplyr::select()
example:
varName <- 'disp' let( c(VARNAME = varName), mtcars %>% select(VARNAME) %>% head() ) # disp # Mazda RX4 160 # Mazda RX4 Wag 160 # Datsun 710 108 # Hornet 4 Drive 258 # Hornet Sportabout 360 # Valiant 225
And wrapr::let()
can also conveniently handle the “varName <- colnames(mtcars)[[1]]
” case.
An issue
dplyr
issue 2726 (reproduced below) discusses a very important and interesting issue.
At a cursory glance the two discussed expressions and the work-around may seem alien, artificial, or even silly:
(function(x) select(mtcars, !!enquo(x)))(disp)
- < color="red">
(function(x) mtcars %>% select(!!enquo(x)))(disp)
< > (function(x) { x <- enquo(x); mtcars %>% select(!!x)})(disp)
However, this is actually a very crisp and incisive example. In fact, if rlang
/tidyeval
were a system up for public revision (such as a RFC or some such proposal) you would expect the equivalence of the above to be part of an acceptance suite.
The first expression looks very much like rlang
/tidyeval
package examples and is the “right way” in rlang
/tidyeval
to send in a column name parametrically. It is in the style preferred by the new package so by the package standards can not be considered complicated, perverse, or verbose. The second expression differs from the first only by the application of the “magrittr
invariant” of “x %>% f()
is to be considered equivalent to f(x)
“.
The outcome is the first expression currently executes as expected, and the second expression errors-out. This can be considered surprising as this is not something anticipated in the documentation or recipes for building up tidy expressions. This is a leak in the combined abstractions, something we are told to back away from as it doesn’t work.
The proposed work-around (expression 3) is helpful, but itself demonstrates another leak in the mutual abstractions. Think of it this way: suppose we had started with expression 3 as working code. We would by referential transparency expect to be able to refactor the code and replace x
with its value and move from this third working example to the second expression (which happens to fail).
To summarize: expressions 1 and 3 are equivalent. They differ by two refactoring steps (introduction/removal of pipes, and introduction/removal of a temporary variable). But we can not demonstrate the equivalence by interpolating in 2 named transformations (going from 1 to 2 to 3, or from 3 to 2 to 1) as the intermediate expression 2 is apparently not valid.
The wrapr::let version of the issue author’s desired expression 2 is:
(function(x) let(c(X = x), mtcars %>% select(X)))('disp')
Conclusion
wrapr::let()
is a useful abstraction:
- It directly takes strings as variable names (the most common source of parametric variable names).
- It is a marco-like replacement and easy to teach as a code re-writing abstraction.
- It has a small interaction surface, and plays well with delayed evaluation packages such as
magrittr
anddplyr 0.5.0
.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.