Site icon R-bloggers

magrittr and wrapr Pipes in R, an Examination

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Let’s consider piping in R both using the magrittr package and using the wrapr package.

magrittr pipelines

The magittr pipe glyph “%>%” is the most popular piping symbol in R.
magrittr documentation describes %>% as follow.

Basic piping:

  • x %>% f is equivalent to f(x)
  • x %>% f(y) is equivalent to f(x, y)
  • x %>% f %>% g %>% h is equivalent to h(g(f(x)))

The argument placeholder

  • x %>% f(y, .) is equivalent to f(y, x)
  • x %>% f(y, z = .) is equivalent to f(y, z = x)

Re-using the placeholder for attributes

It is straight-forward to use the placeholder several times in a right-hand side expression. However, when the placeholder only appears in a nested expressions magrittr will still apply the first-argument rule. The reason is that in most cases this results more clean code.

x %>% f(y = nrow(.), z = ncol(.)) is equivalent to f(x, y = nrow(x), z = nrow(x))

The behavior can be overruled by enclosing the right-hand side in braces:

x %>% {f(y = nrow(.), z = ncol(.))} is equivalent to f(y = nrow(x), z = nrow(x))

That is a bit of simplification, but is the taught mental model.
Grolemund, Wickham, R for Data Science, O’Reilly Media, 2017; “Pipes” describes the magrittr pipe as follows.

 foo_foo %>%
   hop(through = forest) %>%
   scoop(up = field_mouse) %>%
   bop(on = head)

[…]

The pipe works by performing a “lexical transformation”: behind the scenes, magrittr reassembles the code in the pipe to a form that works by overwriting an intermediate object. When you run a pipe like the one above, magrittr does something like this:

 my_pipe <- function(.) {
   . <- hop(., through = forest)
   . <- scoop(., up = field_mice)
   bop(., on = head)
 }
 my_pipe(foo_foo)

Roughly they are saying x %>% f(ARGS) can be considered shorthand for { . <- x; f(., ARGS) } where the evaluation in question happens in a temporary environment.

A Mental Model for Planning Pipelines

To safely and confidently use piping one must eventually know what all of the commonly used related notations mean. For example it is important to know what each of the following evaluate to:

  • 5 %>% sin: the notation demonstrated in the magrittr excerpt.
  • 5 %>% sin(): possibly the notation one would abstract from the R for Data Science excerpt.
  • 5 %>% sin(.): the notation we recomend (especially for the part time R user).

Also, there are questions of how one pipes into general expressions (instead of names, functions, or partially specified function evaluation signatures).
These may seem like details: but they are the steps required to move from copying code from examples and hoping it works (a state of learned helplessness, especially when simple variations fail) or having an effective (even if approximate) mental model for the operators one has decided to work with and plan over.

wrapr pipelines

wrapr supplies its own piping glyph: “dot pipe” %.>%. wrapr’s goal is to supply an operator that is a regular and safe with a %.>% b being approximately syntactic sugar for { . <- a; b } (with, visible side-effects, i.e. we can actually see the “.” assignment happen).

library("wrapr")

# calculate sin(5)
5 %.>% sin(.)
## [1] -0.9589243
# 5 left in dot, a visible side-effect
print(.)
## [1] 5
# clear dot, so no later failing example 
# falsely appears to work
rm(list = ".")

We think wrapr piping is very comprehensible (non-magic) expression oriented pipe with a few rules and additional admonitions:

  • Use explicit dots, i.e. write 5 %.>% sin(.) and not 5 %.>% sin() or 5 %.>% sin. It good to make it obvious to the reader that “.” is a free-name in the right-hand side expression, allowing the easy application of the convention of treating the right-hand side expression as an implicit function of “.”.
  • You get some free de-referencing such as in 5 %.>% sin and function application as in 5 %.>% function(x) { sin(x) }.
  • Outer parentheses do not change meaning (as is commonly the case outside pipelines, modulo R’s visibility controls).
  • Outer braces treat contents as raw statements, turning off wrapr convenience transforms and safety checking. This is compatible with the subtle R convention that brace-blocks {} are considered more opaque and not as eagerly looked into as parenthesized expressions (one such example can be found here).
  • wrapr is grammar in the sense some statements are deliberately not part of the accepted notation. Some of the “errors” in the next set of examples are in fact wrapr refusing certain pipelines.
  • Advanced users can extend wrapr by using R S3 methodology to specify their own rules for various classes (such as building pipable ggplot2 code). Technical details can be found here.

Examples

Let’s consider the following attempts of writing piped variations of sin(5) in both magritter and wrapr notations.

exprs = c(
  "5 PIPE_GLYPH sin",
  "5 PIPE_GLYPH sin()",
  "5 PIPE_GLYPH sin(.)",
  "5 PIPE_GLYPH base::sin",
  "5 PIPE_GLYPH base::sin()",
  "5 PIPE_GLYPH base::sin(.)",
  "5 PIPE_GLYPH ( sin )",
  "5 PIPE_GLYPH ( sin() )",
  "5 PIPE_GLYPH ( sin(.) )",
  "5 PIPE_GLYPH { sin }",
  "5 PIPE_GLYPH { sin() }",
  "5 PIPE_GLYPH { sin(.) }",
  "5 PIPE_GLYPH function(x) { sin(x) }",
  "5 PIPE_GLYPH ( function(x) { sin(x) } )",
  "5 PIPE_GLYPH { function(x) { sin(x) } }",
  "f <- function(x) { sin(x) }; 5 PIPE_GLYPH f"
  )

The point is in a room full of students in a lab setting if you show them “5 %>% sin” some of them are going to try variations or have variations from their work that are important to them. This possibly includes: package-qualifying the function name, wrapping expressions in parenthesis, altering arguments, building functions, and retrieving functions from data structures. The pipeline (for convenience) tries to lower the distinctions between expressions, functions, and function names. However the pipeline notation does not completely eliminate the differences.
A non-expert magrittr/dplyr user might expect all the pipe examples we are about to discuss to evaluate to sin(5) = -0.9589243. As R is routinely used by self-described non-programmers (such as scientists, analysts, and statisticians) the non-expert or part time R user is a very important class of R users (and in fact distinct from beginning R users). So how a system meets or misses simplified expectations is quite important in R.
To run our examples we will use a fairly involved function work_examples() that takes the vector of examples and returns an annotated data.frame of evaluation results. For completeness this code is given here, but can be safely skipped when reading this article.

Now we can work our examples, and return the comparison in tabular format.

work_examples(exprs, sin(5)) %.>%
  knitr::kable(., format = "html", escape = FALSE) %.>%
  column_spec(., 1:4, width = "1.75in") %.>%
  kable_styling(., "striped", full_width = FALSE)
magrittr expr magrittr res wrapr expr wrapr res
5 %>% sin -0.959 5 %.>% sin -0.959
5 %>% sin() -0.959 5 %.>% sin() wrapr::pipe_step.default does not allow direct piping into a no-argument function call expression (such as “sin()”, please use sin(.)).
5 %>% sin(.) -0.959 5 %.>% sin(.) -0.959
5 %>% base::sin unused argument (sin) 5 %.>% base::sin -0.959
5 %>% base::sin() -0.959 5 %.>% base::sin() wrapr::pipe_step.default does not allow direct piping into a no-argument function call expression (such as “base::sin()”, please use base::sin(.)).
5 %>% base::sin(.) -0.959 5 %.>% base::sin(.) -0.959
5 %>% ( sin ) -0.959 5 %.>% ( sin ) -0.959
5 %>% ( sin() ) 0 arguments passed to ‘sin’ which requires 1 5 %.>% ( sin() ) wrapr::pipe_step.default does not allow direct piping into a no-argument function call expression (such as “sin()”, please use sin(.)).
5 %>% ( sin(.) ) object ‘.’ not found 5 %.>% ( sin(.) ) -0.959
5 %>% { sin } .Primitive(“sin”) 5 %.>% { sin } .Primitive(“sin”)
5 %>% { sin() } 0 arguments passed to ‘sin’ which requires 1 5 %.>% { sin() } 0 arguments passed to ‘sin’ which requires 1
5 %>% { sin(.) } -0.959 5 %.>% { sin(.) } -0.959
5 %>% function(x) { sin(x) } Anonymous functions myst be parenthesized 5 %.>% function(x) { sin(x) } -0.959
5 %>% ( function(x) { sin(x) } ) -0.959 5 %.>% ( function(x) { sin(x) } ) -0.959
5 %>% { function(x) { sin(x) } } function (x) { sin(x) } 5 %.>% { function(x) { sin(x) } } function (x) { sin(x) }
f <- function(x) { sin(x) }; 5 %>% f -0.959 f <- function(x) { sin(x) }; 5 %.>% f -0.959

As can now see, some statements were not roughly equivalent to sin(5).
One related case to consider is the following (which we run by hand, as it seems to default knitr or kableExtra html styling, note: the “‘\[’” and other formatting errors are an artifacts of HTML quoting/rendering, and not part of the expressions):

c("lst <- list(h = sin); 5 PIPE_GLYPH lst$h",
  "lst <- list(h = sin); 5 PIPE_GLYPH lst$h()",
  "lst <- list(h = sin); 5 PIPE_GLYPH lst$h(.)",
  "lst <- list(h = sin); 5 PIPE_GLYPH lst[['h']]",
  "lst <- list(h = sin); 5 PIPE_GLYPH lst[['h']]()",
  "lst <- list(h = sin); 5 PIPE_GLYPH lst[['h']](.)") %.>%
  work_examples(., sin(5)) %.>%
  knitr::kable(., format = "html", escape = FALSE)
magrittr expr magrittr res wrapr expr wrapr res
lst <- list(h = sin); 5 %>% lst$h 3 arguments passed to ‘$’ which requires 2 lst <- list(h = sin); 5 %.>% lst$h -0.959
lst <- list(h = sin); 5 %>% lst$h() -0.959 lst <- list(h = sin); 5 %.>% lst$h() wrapr::pipe_step.default does not allow direct piping into a no-argument function call expression (such as “lst$h()”, please use lst$h(.)).
lst <- list(h = sin); 5 %>% lst$h(.) -0.959 lst <- list(h = sin); 5 %.>% lst$h(.) -0.959
lst <- list(h = sin); 5 %>% lst[[‘h’]] incorrect number of subscripts lst <- list(h = sin); 5 %.>% lst[[‘h’]] -0.959
lst <- list(h = sin); 5 %>% lst[[‘h’]]() -0.959 lst <- list(h = sin); 5 %.>% lst[[‘h’]]() wrapr::pipe_step.default does not allow direct piping into a no-argument function call expression (such as “lst[[”h“]]()”, please use lst[[“h”]](.)).
lst <- list(h = sin); 5 %>% lst[[‘h’]](.) -0.959 lst <- list(h = sin); 5 %.>% lst[[‘h’]](.) -0.959

Analysis

magrittr Results

The magrittr exceptions include the following.

  • :: is a function, as so many things are in R. So base::sin is not really the package qualified name for sin(), it is actually shorthand for `::`("base", "sin") which is a function evaluation that performs the look-up. So 5 %>% base::sin expands to an analogue of . <- 5; `::`(., "base", "sin"), leading to the observed error message.
  • () is magrittr’s “evaluate before piping into” notation, so 5 %>% ( sin() ) and 5 %>% ( sin(.) ) both throw an error as evaluation is attempted before any alteration of arguments is attempted.
  • {} is magrittr’s “treat the contents as raw statements” notation (which is not in fact magrittr’s default behavior). Thus magrittr’s function evaluation signature alteration transforms are not applied to 5 %>% { sin } or 5 %>% { sin() }.

Again, the above are not magrittr bugs, they are just how magrittr’s behavior differs from a very regular or naive internalization of magrittr rules. Notice neither of “()” nor “{}” are neutral notations in magrittr (the first adds an extra evaluation, and second switches to an expression mode with fewer substitutions). Also note the above is an argument for preferring “sin(.)” to “sin()”, or “sin”; as “sin(.)” had the most regular magrittr behavior (not changing with the introduction of “()”, “{}”, or “base::”).
Regularity is especially important for part time users, as you want reasonable variations of what is taught to work so that experimentation is positive and not an exercise in learned helplessness. It is convenient when your tools happen to work the way you might remember.

wrapr Results

The wrapr error messages and non-numeric returns are driven by the following:

  • 5 %.>% sin() is not an allowed wrapr notation. The wrapr philosophy is not to alter evaluation signatures. The error message is signalling that the statement is not valid wrapr grammar (not well formed in terms of wrapr rules). Notice the error message suggests the alternate notation sin(.). Similar rules apply for base::sin(). Then intent is that outer parenthesis are non-semantic, they do not change change wrapr pipe behavior.
  • 5 %.>% { sin } returns just the sin function. This is because {} triggers wrapr’s “leave the contents alone” behavior.

The user only encounters two exceptions in the above variations. The first is “don’t write sin()”, which comes with a clear error message and help (“try sin(.)”). The second is “outer {} treats its contents as raw statements, turning off transforms and checking.
wrapr is hoping to stay close the principle of least surprise.
The hope is that wrapr piping is easy, powerful, useful, and not too different than a %.>% b being treated as almost syntactic sugar for { . <- a; b }.

Aesthetics

An obvious down-side of wrapr piping is the excess dots both in the operator and in the evaluation arguments. We strongly feel the extra dots in the evaluation arguments is actually a good trade in losing some conciseness in exchange for useful explicitness. We do not consider the extra dot in the pipe operator to be a problem (especially if you bind the operator to a keyboard shortcut). If the extra dot in the pipe operator is such a deal-breaker, consider that it could be gotten rid of by copying the pipe operator to your notation of choice (such as executing `%>%` <- wrapr::`%.>%` or `%.%` <- wrapr::`%.>%` at the top of your work). However such re-mappings are needlessly confusing and it is best to use the operator glyph that wrapr directly supplies.

Non-function examples

We can also try a few simpler expressions, that do not have an explicit function marker such as sin(.).

c("5 PIPE_GLYPH 1 + .",
  "5 PIPE_GLYPH (1 + .)",
  "5 PIPE_GLYPH {1 + .}") %.>%
  work_examples(., 6) %.>%
  knitr::kable(., format = "html", escape = FALSE) %.>%
  column_spec(., 1:4, width = "1.75in") %.>%
  kable_styling(., "striped", full_width = FALSE)
magrittr expr magrittr res wrapr expr wrapr res
5 %>% 1 + . attempt to apply non-function 5 %.>% 1 + . wrapr::pipe_step.default does not allow direct piping into simple values such as class:numeric, type:double.
5 %>% (1 + .) non-numeric argument to binary operator 5 %.>% (1 + .) 6
5 %>% {1 + .} 6 5 %.>% {1 + .} 6

Some of what caused exceptions above is “5 %ANYTHING% 1 + .” is parsed (due to R’s operator precedence rules) as “(5 %ANYTHING% 1) + .”. So without extra grouping notations (“()” or “{}”) this is not a well-formed pipeline. With wrapr it is safe to add in parenthesis, with magrittr one must use {} (though this can not be used with 5 %>% {sin}).

The Importance of Strictness

For some operations that are unlikely to work close to reasonable user intent wrapr includes checks to warn-off the user. The following shows a few more examples of this “defense of grammar.”

5 %.>% 7
## Error in pipe_step.default(pipe_left_arg, pipe_right_arg, pipe_environment, : wrapr::pipe_step.default does not allow direct piping into simple values such as class:numeric,  type:double.
# magrittr's error message for the above is something of the form:
# "Error in function_list[[k]](value) : attempt to apply non-function"

5 %.>% .
## Error in pipe_step.default(pipe_left_arg, pipe_right_arg, pipe_environment = pipe_environment, : wrapr::pipe_step.default does not allow direct piping into simple values such as class:numeric,  type:double.
# note: the above error message is improved to:
# "wrapr::pipe does not allow direct piping into '.'"
# in wrapr 1.4.1

5 %.>% return(.)
## Error in pipe_step.default(pipe_left_arg, pipe_right_arg, pipe_environment, : wrapr::pipe_step.default does not allow direct piping into certain reserved words or control structures (such as "return").

Throwing errors in these situations is based on the principle that non-signalling errors (often leading to result corruption) are much worse than signalling errors. The “return” example is an interesting case in point.
Let’s first take a look at the effect with magrittr. Suppose we were writing a simple function to find for a positive integer returns the smallest non-trivial (greater than 1 and less than the value in question) positive integer divisor of the value in question (returning NA if there is none such). Such a function might work like the following.

f_base <- function(x) {
  u <- min(ceiling(sqrt(x)), x-1L)
  i <- 2L
  while(i<=u) {
    if((x %% i)==0) {
      return(i)
    }
    i <- i + 1L
  }
  NA_integer_
}

f_base(37)
## [1] NA
f_base(35)
## [1] 5

Now suppose we try to get fancy and use “i %>% return” instead of “return(i)”. This produces a function that thinks all integer are prime. The reason is: magrittr can call the return() function, but in this situation return() can’t manage the control path of the original function.

f_magrittr <- function(x) {
  u <- min(ceiling(sqrt(x)), x-1L)
  i <- 2L
  while(i<=u) {
    if((x %% i)==0) {
      i %>% return
    }
    i <- i + 1L
  }
  NA_integer_
}

f_magrittr(37)
## [1] NA
f_magrittr(35)
## [1] NA

Now suppose we tried the same thing with wrapr pipe and write i %.>% return(.).

f_wrapr <- function(x) {
  u <- min(ceiling(sqrt(x)), x-1L)
  i <- 2L
  while(i<=u) {
    if((x %% i)==0) {
      i %.>% return(.)
    }
    i <- i + 1L
  }
  NA_integer_
}

f_wrapr(37)
## [1] NA
f_wrapr(35)
## Error in pipe_step.default(pipe_left_arg, pipe_right_arg, pipe_environment, : wrapr::pipe_step.default does not allow direct piping into certain reserved words or control structures (such as "return").

wrapr also can not handle return() control flow correctly, however it (helpfully) throws an exception to indicate the problem.

Conclusion

R usually has more than one good way to perform tasks. In this case we talked about two methods of building pipelines in R: magrittr and wrapr. There are more methods (some of which are listed here). Our preferred pipe is the wrapr dot-pipe, and in the of style academic priority we try to credit alternatives and share fair comparisons (as we have done here). Priority is important to respect (as in: magrittr is powerful, popular, came well before, and greatly influences wrapr dot-pipe), but it is not monopoly rights (for example: the public CRAN release/announcement of let(), our popular and still preferred substitution methodology and originally part of replyr, predates the public CRAN release/announcement of rlang/tidyeval code re-writing methods). In client work we use whatever style is most compatible with the client’s work and needs, for example we feel it does not make sense to take a legacy dplyr project and attempt to switch the pipe notation late in the game (and one does not want to needlessly mix notations).

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.