More on safe substitution in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Let’s worry a bit about substitution in R
. Substitution is very powerful, which means it can be both used and mis-used. However, that does not mean every use is unsafe or a mistake.
From Advanced R : substitute:
We can confirm the above code performs no substitution:
a <- 1 b <- 2 substitute(a + b + z)
## a + b + z
And it appears the effect is that substitute is designed to not take values from the global environment. So, as we see below, it isn’t so much what environment we are running in that changes substitute’s behavior, it is what environment the values are bound to that changes things.
(function() { a <- 1 substitute(a + b + z, environment()) })()
## 1 + b + z
We can in fact find many simple variations of substitute that work conveniently.
substitute(a + b + z, list(a=1, b=2))
## 1 + 2 + z
substitute(a + b + z, as.list(environment()))
## 1 + 2 + z
Often R
‘s documentation is a bit terse (or even incomplete) and functions (confusingly) change behavior based on type of arguments and context. I say: always try a few variations to see if some simple alteration can make "base-R" work for you before giving up and delegating everything to an add-on package.
However, we in fact found could not use substitute()
to implement wrapr::let()
effects (that is re-mapping non-standard interfaces to parametric interfaces). There were some avoidable difficulties regarding quoting and un-quoting of expressions. But the killing issue was: substitute()
apparently does not re-map left-hand sides:
# function that print all of its arguments (including bindings) f <- function(...) { args <- match.call() print(paste("f() call is:", capture.output(str(args)))) } # set up some global variables X <- 2 B <- 5 # try it f(X=7, Y=X)
## [1] "f() call is: language f(X = 7, Y = X)"
# use substitute to capture an expression captured <- substitute(f(X=7, Y=X)) # print the captured expression print(captured)
## f(X = 7, Y = X)
# evaluate the captured expression eval(captured)
## [1] "f() call is: language f(X = 7, Y = X)"
# notice above by the time we get into the function # the function arguments have taken there value first # from explicit argument assignment (X=7) and then from # the calling environment (Y=X goes to 2). # now try to use substitute() to re-map values xform1 <- substitute(captured, list(X= as.name('B'))) # doesn't look good in printing print(xform1)
## captured
# and substitutions did not happen as the variables we # are trying to alter are not free in the word "captured" # (they are in the expression the name captured is referring to) eval(xform1)
## f(X = 7, Y = X)
# can almost fix that by calling substitute on the value # of captured (not the word "captured") with do.call() subs <- do.call(substitute, list(captured, list(X= as.name('B')))) print(subs)
## f(X = 7, Y = B)
eval(subs)
## [1] "f() call is: language f(X = 7, Y = B)"
# notice however, only right hand side was re-mapped # we saw "f(X = 7, Y = B)", not "f(B = 7, Y = B)" # for some packages (such as dplyr) re-mapping # left-hand sides is important # this is why wrapr::let() exists wrapr::let( c(X= 'B'), f(X=7, Y=X) )
## [1] "f() call is: language f(B = 7, Y = B)"
Re-mapping left hand sides is an important capability when trying to program over dplyr
:
suppressPackageStartupMessages(library("dplyr")) d <- data.frame(x = 1:3) mapping <- c(OLDCOL= 'x', NEWCOL= 'y') wrapr::let( mapping, d %>% mutate(NEWCOL = OLDCOL*OLDCOL) )
## x y ## 1 1 1 ## 2 2 4 ## 3 3 9
wrapr::let()
is based on string substitution. This is considered risky. Consider help(substitute, package='base')
Note
substitute works on a purely lexical basis. There is no guarantee that the resulting expression makes any sense.
And that is why wrapr::let()
takes a large number of precautions and vets user input before performing any substitution.
The idea is: wrapr::let()
is more specialized than substitute()
so in addition to attempting extra effects (re-mapping left hand sides) it can introduce a lot of checks to ensure safe invariants.
And that is a bit of my point: when moving to a package look for specificity and safety in addition to "extra power." That is how wrapr::let()
is designed and whey wrapr::let()
is a safe and effective package to add to your production work-flows.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.