Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
R Tip: be wary of “...
“.
The following code example contains an easy error in using the R function unique()
.
vec1 <- c("a", "b", "c") vec2 <- c("c", "d") unique(vec1, vec2) # [1] "a" "b" "c"
Notice none of the novel values from vec2
are present in the result. Our mistake was: we (improperly) tried to use unique()
with multiple value arguments, as one would use union()
. Also notice no error or warning was signaled. We used unique()
incorrectly and nothing pointed this out to us. What compounded our error was R
‘s “...
” function signature feature.
In this note I will talk a bit about how to defend against this kind of mistake. I am going to apply the principle that a design that makes committing mistakes more difficult (or even impossible) is a good thing, and not a sign of carelessness, laziness, or weakness. I am well aware that every time I admit to making a mistake (I have indeed made the above mistake) those who claim to never make mistakes have a laugh at my expense. Honestly I feel the reason I see more mistakes is I check a lot more.
Data science coding is often done in a rush (deadlines, wanting to see results, and so on). Instead of moving fast, let’s take the time to think a bit about programming theory using a very small concrete issue. This lets us show how one can work a bit safer (saving time in the end), without sacrificing notational power or legibility.
A confounding issue is: unique()
failed to alert me of my mistake because, unique()
‘s function signature (like so many R
functions) includes a “...
” argument. I might have been careless or in a rush, but it seems like unique
was also in a rush and did not care to supply argument inspection.
In R
a function that includes a “...
” in its signature will accept arbitrary arguments and values in addition to the ones explicitly declared and documented. There are three primary uses of “...
” in R
: accepting unknown arguments that are to be passed to later functions, building variadic functions, and forcing later arguments to be bound by name (my favorite use). Unfortunately, “...
” is also often used to avoid documenting arguments and turns off a lot of very useful error checking.
An example of the “accepting unknown arguments” use is lapply()
. lapply()
passes what it finds in “...
” to whatever function it is working with. For example:
lapply(c("a", "b"), paste, "!", sep = "") # [[1]] # [1] "a!" # # [[2]] # [1] "b!"
Notice the arguments “"!", sep = ""” were passed on to paste()
. Since lapply()
can not know what function the user will use ahead of time it uses the “...
” abstraction to forward arguments. Personally I never use this form and tend to write the somewhat more explicit and verbose style shown below.
lapply(c("a", "b"), function(vi) { paste(vi, "!", sep = "") })
I feel this form is more readable as the arguments are seen where they are actually used. (Note: this, is a notional example- in practice we would use “paste0(c("a", "b"), "!")
” to directly produce the result as a vector.)
An example of using “...
” to supply a variadic interface is paste()
itself.
paste("a", "b", "c") # [1] "a b c"
Other important examples include list()
and c()
. In fact I like list()
and c()
as they only take a “...
” and no other arguments. Being variadic is so important to list()
and c()
is that is essentially all they do. One can often separate out the variadic intent with lists or vectors as in:
paste(c("a", "b", "c"), collapse = " ") # [1] "a b c"
Even I don’t write code such as the above (that is too long even for me), unless the values are coming from somewhere else (such as a variable). However with wrapr
‘s reduce/expand operator we can completely separate the collection of variadic arguments and their application. The notation looks like the following:
library("wrapr") values <- c("a", "b", "c") values %.|% paste # [1] "a b c"
Essentially reduce/expand calls variadic functions with items taken from a vector or list as individual arguments (allowing one to program easily over variadic functions). %.|%
is intended to place values in the “...
” slot of a function (the variadic term). It is designed for a more perfect world where when a function declares “...
” in its signature it is then the only user facing part of the signature. This is hardly ever actually the case in R
as common functions such as paste()
and sum()
have additional optional named arguments (which we are here leaving at their default values), whereas c()
and list()
are pure (take only “...
“).
With a few non-standard (name capturing) and variadic value constructors one does not in fact need other functions to be name capturing or variadic. With such tools one can have these conveniences everywhere. For example we can convert our incorrect use of unique()
into correct code using c()
.
unique(c(vec1, vec2)) # [1] "a" "b" "c" "d"
In the above code roles are kept separate: c()
is collecting values and unique()
is applying a calculation. We don’t need a powerful “super unique” or “super union” function, unique()
is good enough if we remember to use c()
.
In the spirit of our earlier article on function argument names we have defined a convenience function wrapr::uniques()
that enforces the use of value carrying arguments. With wrapr::uniques()
if one attempts the mistake I have been discussing one immediately gets a signaling error (instead of propagating incorrect calculations forward).
library("wrapr") uniques(c(vec1, vec2)) # [1] "a" "b" "c" "d" uniques(vec1, vec2) # Error: wrapr::uniques unexpected arguments: vec2
With uniques()
we either get the right answer, or we get immediately stopped at the mistaken step. This is a good way to work.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.