Site icon R-bloggers

Piping into ggplot2

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In our wrapr pipe RJournal article we used piping into ggplot2 layers/geoms/items as an example.

Being able to use the same pipe operator for data processing steps and for ggplot2 layering is a question that comes up from time to time (for example: Why can’t ggplot2 use %>%?). In fact the primary ggplot2 package author wishes that magrittr piping was the composing notation for ggplot2 (though it is obviously too late to change).

There are some fundamental difficulties in trying to use the magrittr pipe in such a way. In particular magrittr looks for its own pipe by name in un-evaluated code, and thus is difficult to engineer over (though it can be hacked around). The general concept is: pipe stages are usually functions or function calls, and ggplot2 components are objects (verbs versus nouns); and at first these seem incompatible.

However, the wrapr dot-arrow-pipe was designed to handle such distinctions.

Let’s work an example.

Suppose we want a single pipe notation to combine data processing and ggplot2 layer composition steps.

The wrapr dot-arrow-pipe performs data processing steps by explicit use of dot and sequencing of expressions. The dot-arrow semantics treats a %.>% b as being very much like {. <- a; b}. This means if the user writes an expression such as a %.>% b(.) the evaluation is very similar to b(a) (with some visible side-effects in the "." variable). Explicit argument notation is in fact a wrapr dot-arrow design principle, though wrapr dot arrow has some short-cuts and convenience methods (such as treating a %.>% f as a %.>% f(.) when f is bound to a function as in 5 %.>% sin). The wrapr dot-arrow-pipe concept is sequencing of expressions, which is related to (but not the same as) composition of functions.

With this in mind lets use wrapr dot-arrow-pipe for both data processing and ggplot2 layering.

First we load our packages.

library("ggplot2")
library("wrapr")
suppressPackageStartupMessages(library("dplyr"))

Now we tell wrapr what do do when the left-argument to a pipe is of class-gg. In this case we say treat a %.>% b as a + b (instead of the default {. <- a; b}). Notice this is an extension that any user or package can make, wrapr does not have special adaption for ggplot2. Instead it supplies sufficient control to be able to adapt to ggplot2.

apply_left.gg <- function(pipe_left_arg,
                          pipe_right_arg,
                          pipe_environment,
                          left_arg_name,
                          pipe_string,
                          right_arg_name) {
  pipe_right_arg <- eval(pipe_right_arg,
                         envir = pipe_environment,
                         enclos = pipe_environment)
  pipe_left_arg + pipe_right_arg 
}

Now we can run a single pipeline that combines data processing steps and ggplot plot construction.

data.frame(x = 1:20) %.>%
  mutate(., y = cos(3*x)) %.>%
  ggplot(., aes(x = x,  y = y)) %.>%
  geom_point() %.>%
  geom_line() %.>%
  ggtitle("piped ggplot2")

Notice the data processing step mutate() and the initial ggplot() step must use ".", as that is how the dot-arrow-pipe normally moves data forward. The ggplot2 geom/layer/item pieces do not use ".". The evaluation rule is these items are evaluated as "pipe_right_arg" before seeing any of the pipeline to the left; this is roughly how ggplot2 handles composition through its override of "+".

And this is where we stopped the ggplot2 example in the wrapr‘s pipe RJournal article. The article was about the wrap dot-arrow-pipe, and not about ggplot2 so it seemed important to move on from the example quickly.

A minor (undiscussed) technical difficulty is: ggplot2 users can store pieces of a plot in variables (or more correctly variable names my refer to pieces in environments). They may use this to write more modular plotting code. For example suppose the user stored the geom_line() specification and the title in two variables as below.

line <- geom_line(linetype = 2)

title <- ggtitle("piped ggplot2",
                 subtitle = "pre-stored title")

We can not, without a bit more adaption, pipe into these structures.

data.frame(x = 1:20) %.>%
  mutate(., y = cos(3*x)) %.>%
  ggplot(., aes(x = x,  y = y)) %.>%
  geom_point() %.>%
  line %.>%
  title
## Error: wrapr::apply_right_S4 default called with classes:
##   gg, ggplot 
##  line gg, ggplot 
##   must have a more specific S4 method defined to dispatch

The above error message is intentional. When the right-hand side of a wrapr pipe is the name of an object (instead of an expression or function) wrapr uses a right-dispatch followed by an S4 dispatch to decide what to do. The error message here just means there is no registered right or S4 dispatch for the classes of objects seen. This is easy to fix by examining the classes of the objects and adding appropriate right dispatch methods. What we are trying to say: the original left dispatch was enough to deal with plots where all the layers are presented as un-evaluated function calls (the most typical way to user ggplot2), however layers/geoms/items already evaluated and stored in variables need additional adaption.

This detail is not fragility of the wrapr dot-arrow-pipe, but an important design distinction. Names and objects on the right side of the pipe can explicitly specify stand-in or surrogate functions or effects. The idea was already present in allowing 5 %.>% sin to be a shorthand for 5 %.>% sin(.). We have merely generalized the surrogate effect and added explicit control points.

To extend the ggplot2 effects to use stored values, we first examine the classes of the items we want rules for.

class(line)
## [1] "LayerInstance" "Layer"         "ggproto"       "gg"
class(title)
## [1] "labels"

Notice title is not of class gg, so it will need its own rule.

The new S3 right-dispatch rules are.

apply_right.gg <- function(pipe_left_arg,
                           pipe_right_arg,
                           pipe_environment,
                           left_arg_name,
                           pipe_string,
                           right_arg_name) {
  pipe_left_arg + pipe_right_arg 
}

apply_right.labels <- function(pipe_left_arg,
                               pipe_right_arg,
                               pipe_environment,
                               left_arg_name,
                               pipe_string,
                               right_arg_name) {
  if(!("gg" %in% class(pipe_left_arg))) {
    stop("apply_right.labels expected left argument to be class-gg")
  }
  pipe_left_arg + pipe_right_arg 
}

And, just for fun, let’s assign the wrapr-dot-arrow-pipe to the shorter pipe-name %.%. Unlike magrittr pipe the dot-arrow-pipe semantics do not depend on the name of the pipe.

`%.%` <- wrapr::`%.>%`

With these rules in place we can now build up a ggplot2 combining data processing, in-line steps, and stored layers/items.

data.frame(x = 1:20) %.%
  mutate(., y = cos(3*x)) %.%
  ggplot(., aes(x = x,  y = y)) %.%
  geom_point() %.%
  line %.%
  title

More details on wrapr dot-arrow-pipe can be found in the RJournal article.

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.