R Tip: Make Arguments Explicit in magrittr/dplyr Pipelines
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I think this is the R Tip that is going to be the most controversial yet. Its potential pitfalls include: it is a style prescription (which makes it different than and less immediately useful than something of the nature of R Tip: Force Named Arguments), and it is heterodox (this is not how magrittr
/dplyr
is taught by the original authors, and not how it is commonly used). However, I have not been at all good at anticipating which tips get which sort of reception (and this valuable feedback, public and private, is part of what I get of this series).
On to the tip (which only applies if you are a magrittr
pipeline user).
R tip: when using magrittr
pipelines consider making them more explicit, and more readable (especially to novices) by using explicit dot-arguments throughout.
The advice is: write pipelines that look like the this:
suppressPackageStartupMessages(library("dplyr")) starwars %>% filter(., height > 200) %>% select(., height, mass) %>% head(.)
And avoid overly concise pipelines such as the this:
starwars %>% filter(height > 200) %>% select(height, mass) %>% head
The guidance is: each step in a simple magrittr
pipeline is a function call that has at least one of its arguments directly written as “.
“. Example: “atan2(3, .)
” is a simple step, but neither “atan
” nor “atan2(abs(.), 5)
” is a simple step.
The intended point is: the first pipeline is more explicit and regular. This makes it easier to explain and easier for newcomers to read. For pipelines limited to this style: approximately each step is run in sequence as if the value of the last step were in a variable named “.
“.
Note: the exact magrittr
semantics are in fact more detailed that what I just said. The idea is to start newcomers in a sub-dialect of magrittr
that has a simpler correct mental model before (or if ever) moving to the full details. The full details are perhaps more than a part time R
user should be expected to remember. It is a bit much to expect a non-cognoscenti always remember that “5 %>% atan2(3, .)
” is completely different than “5 %>% atan2(3, abs(.))
“, and that “5 %>% {. + 1}
” is completely different than “5 %>% (. + 1)
“.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.