Site icon R-bloggers

The Case For Using -> In R

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R has a number of assignment operators (at least “<-“, “=“, and “->“; plus “<<-” and “->>” which have different semantics).

The R-style guides routinely insist on “<-” as being the only preferred form. In this note we are going to try to make the case for “->” when using magrittr pipelines.


Don Quijote and Sancho Panza, by Honoré Daumier

Assignment in R

R‘s preferred assignment operator is “<-“. This is in the popular style guides. If you write using this style you can organize your code so that:

This has some advantages, and is the public style. Also “=” is much harder to use inside R’s base::quote method than “<-“, so there are still cases where the semantics of “=” and “<-” are different (though I think they all involve the distinction trying specify argument binding versus assignment while inside a function call’s argument list).

I have previously written that given the choice I prefer “=” for assignment. It has the advantages that:

Now I said “given the choice” which means to work with others you have to use “<-” or at least admit that you are being stubborn. I teach “<- for assignment” as I do not wish to set up students for ridicule (and they being less informed on the history or R are less equipped to defend theirselves on this issue).

That being said I still don’t actually like “<-“. And in fact I am not sure why the R community has so fetishized its use. “<-” comes form an era when it was actually a symbol on the keyboard and two other S assignment operators from that era (“_” and “:=“) have have not survived in the R language (please see here). I think the style is largely enforced as a kind of argot or “inside language” to express loyalty to R.

A deliberately provocative proposal

That being said I have really come to like using R‘s “->” operator. I know I can’t always get away with it but consider the advantage using “->” brings to western readers (meaning users of Greek derived alphabets): you can then simply read code from left to right. If I am not allowed to use “=” I want something back in exchange, and “->” actually has some interesting advantages. Let us set up a proposal that is admittedly incompatible with my previous argument.

Consider the following statement:

x = 3 + 4

This is read in R, and most common programming languages, as “assign the value of 3 + 4 to the variable x.” We know to read it this way because “assignment has lower operator precedence than plus.” Roughly this means there implicit parenthesization rules that mean “x=3+4” is actually shorthand for “x=(3+4)” (roughly because in R explicit use of parentheses also controls the auto-printing behavior of values). But consider the same statement written with “->“:

3 + 4 -> x

The semantics still come from operator precedence rules, but now the syntax is emphasizing the same thing: the calculation happens before (to the left of) the assignment. This may not seem like much to experienced programmers- but that is because so many programming languages use the frankly unnatural “x=3+4” notation (so we are used to it).

A substantial advantage comes when using magrittr pipes in R.

Suppose I write the following magrittr pipeline:

# Count number of NA in columns x,y, # and z using pure dplyr notation # or back-end agnostic dplyr code. # This involves avoiding use of $ # or things like multiple intermediate # values in dplyr::summarize. # This is a useful example as # complete.cases isn't available on # all dplyr data services. # ifelse() is to ensure type # conversions on remote SQL. library("dplyr") my_db <- dplyr::src_sqlite(":memory:", create = TRUE) data.frame( x = c(1, 2, 2), y = c(3, 5, NA), z = c(NA, 'a', 'b'), rowNum = 1:3, stringsAsFactors = FALSE ) %>% copy_to(my_db, ., 'd') %>% mutate(nna = ifelse(is.na(x),1,0) + ifelse(is.na(y),1,0) + ifelse(is.na(z),1,0)) %>% arrange(rowNum) -> dres

In this notation we see that now “->” is itself a pipe compatible operator that moves values to variables. The pipeline itself is already moving left to right top to down. Placing the assignment first would give us an ugly two directional flow.

Non semantic changes in the pipeline are now syntactically cheap and localized (as they should be). For example: want to land intermediate results for reasons of efficiency or necessary side-effects? Solution: insert “-> varName LINEBREAK varName %>%” at will, as you already do with dplyr::collapse() and dplyr::compute().

The syntax is now working for us instead of against us. I feel once you start using magrittr pipelines (which are written left to right, as we did here) the next logical step is use “->” for consistency.

Syntax Matters

The following code has essentially the same semantics as the previous magrittr pipes, without needing a piping operator.

data.frame( x = c(1, 2, 2), y = c(3, 5, NA), z = c(NA, 'a', 'b'), rowNum = 1:3, stringsAsFactors = FALSE ) -> . copy_to(my_db, ., 'd2') -> . mutate(., nna = ifelse(is.na(x),1,0) + ifelse(is.na(y),1,0) + ifelse(is.na(z),1,0)) -> . arrange(., rowNum) -> dres

The above code has the advantage that it is easier to debug in that you can stop at any stage and the intermediate results are convenient to inspect. However, there was no great call for code in this style (or the matching beginning of line “. <-” version) prior to the introduction of magrittr. It just isn’t as enjoyable to use a mere coding convention as it is to use magrittr pipe syntax.

Conclusion

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.