Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
There has been some talk of adding native pipe notation to R (for example here, here, and here).
I think a critical aspect of such an extension would be to treat such a notation as syntactic sugar and not insist such a pipe match magrittr semantics, or worse yet give a platform for authors to insert their own preferred ad-hoc semantics.
A prominent place where pipe-notation is used is in the language F#,, where it is literally defined in F# itself as:
let (|>) x f = f x
From this simple definition, one versed in the semantics of F# has a chance at inferring the semantics of the pipe.
In the R language, the magrittr package supplies a pipe written as “%>%”. This pipe’s implementation depends on very complicated unevaluated expression capture and direct manipulation of execution environments. Also the magrittr pipe picks its own semantics as it wants, as it does not inherit semantics from a simple definition. For example all of the following are valid magrittr:
library("magrittr") 4 %>% sin #> [1] -0.7568025 4 %>% sin() #> [1] -0.7568025 4 %>% sin(.) #> [1] -0.7568025
These all seem to be convenient choices, but new users have to memorize them as they can not infer these from things they may already know about the R programming language.
Also the freedom of choice in semantics means many arbitrary choices get made (nothing is prior apodictic) and you get some debatable choices such as: “data.frame(x=1) %>% dplyr::bind_rows(list(.,.))
” having 3 rows.
What I would like if a native pipe were to be added to R is: for the pipe to be defined as formally equivalent to some larger R expression. Then we could teach it as such and we would not get any new corner cases or exceptional behaviors. Even if an implementation fails to reach this standard, this way of defining things lets us now how it should have worked (in a perfect world).
A good R pipe operator might have an aspirational definition along the lines of:
"a %.>% b" is to be treated: as if the user had written "{ . <- a; b };" with "%>%" being treated as left-associative, and .-side effects removed.
Notice we are not saying “b(a)
” as that doesn’t deal with repeated use of arguments and other cases R-users have come to expect. This also allows piping into non-function expressions (a neat feature).
Surprisingly enough the above actually works. It means a pipeline of such as the following:
a <- 4 %.>% sin(.) %.>% exp(.) %.>% cos(.)
Is unambiguously meant to be a short-hand for the following (ugly) nested code.
a <- { . <- { . <- { . <- 4 ; sin(.) }; exp(.) }; cos(.) };
It does not matter that nobody would write the nested code, that is precisely what we are not asking anybody to do. The point is, a student can attempt to check this translation on small examples, and even run both versions.
Notice in the above ugly nested example that “};
” is starting to look a bit like a piping operator. This is calling out two things:
- Piping syntax largely a convention of using (possibly anonymous) intermediate values instead of nesting of calls.
- Piping semantics are largely about sequencing statements, this is the usual monad is the expensive way to say “programmable semicolon” observation.
Roughly if the user is willing to write code such as the following, then they don’t need pipes.
. <- 4 . <- sin(.) . <- exp(.) . <- cos(.) a <- .
Or, as we have observed before, some notations start to look like you already have piping capabilities in base-R (arguing one should give base-R a chance before insisting on extensions).
For example we can use “;.<-
” as a pipe (the first one I noticed, and my attempt to not use left-arrow that often):
a <-{ 4 ->.;.<- sin(.) ;.<- exp(.) ;.<- cos(.) ;.}
Or we can use “->.;
” as a pipe:
a <-{ 4 ->.; sin(.) ->.; exp(.) ->.; cos(.) }
I think this last one is actually pretty if we go all-in with right-assignment:
4 ->.; sin(.) ->.; exp(.) ->.; cos(.) -> a
The remaining strong objections to this Bizarro Pipe notation (in my mind) are:
- “
->.;
” is an ugly glyph. This is because it’s representation is its implementation. - This pipe isn’t very compatible with left-assignment (R’s prefrered assignment) without adding additional blocks.
Roughly: introducing new notation need not be as disruptive as introducing new semantics. Also conventions can have great advantage, even if they do not have language assistance or enforcement (though such things are good).
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.