Site icon R-bloggers

%$% : upping your pipe game

[This article was first published on woodpeckR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Problem

What do I do when %>% doesn’t work?

Context

I love the %>%  pipe. Originally from magrittr, it’s now characteristic of tidy code. Using %>% has revolutionized how I write code in R (pssst! coming soon: an interactive pipe tutorial!). But sometimes the basic pipe falls short.

table() is one of my favorite functions for exploring data in R: it creates a frequency table of values in a vector. I use table() to do sanity checks on my data, make sure that all factor levels are present, and generally get a sense of how my observations are distributed.

A while back, though, I noticed that table() didn’t play nice with the %>% pipe.

I’ve collected some data on my friends’ pets. Here it is (using pseudonyms, in case anyone has a secret pet they don’t want the world to know about…).

This is one of the cats in the data frame below. She would like to hold your hand.
# Load magrittr
> library(magrittr)

# Create data
> pets <- data.frame(friend = c("Mark", "Mark", "Kyle", "Kyle", "Miranda", "Kayla", "Kayla", "Kayla", "Adriana", "Adriana", "Alex", "Randy", "Nancy"),
                   pet = c("cat", "cat", "cat", "cat", "cat", "dog", "cat", "lizard", "cat", "cat", "dog", "dog", "woodpecker"),
                   main_pet_color = c("brown", "brown", "multi", "multi", "brown", "brown", "brown", "orange", "black", "white", "multi", "white", "multi"))

# Look at the data
> pets
    friend        pet main_pet_color
1     Mark        cat          brown
2     Mark        cat          brown
3     Kyle        cat          multi
4     Kyle        cat          multi
5  Miranda        cat          brown
6    Kayla        dog          brown
7    Kayla        cat          brown
8    Kayla     lizard         orange
9  Adriana        cat          black
10 Adriana        cat          white
11    Alex        dog          multi
12   Randy        dog          white
13   Nancy woodpecker          multi

Unsurprisingly, it looks like there are a lot of cats and dogs! There are also a lot of brown pets and a lot of multicolored ones. Let’s say I want to see a frequency table of the pet colors. I know that I can do this with table(), like so:

# Make a frequency table of pet colors
> table(pets$main_pet_color)
 black  brown  multi orange  white 
     1      5      4      1      2

But if I want to use tidy syntax, I’d try to do it this way instead:

# Make a frequency table of pet colors
> pets %>% table(main_pet_color)
Error in table(., main_pet_color) : object 'main_pet_color' not found

What’s up with this? The syntax should work. pet is definitely a valid variable name in the data frame pets, and if I had used a different function, like arrange(), I would have had no problems:

# Arrange the data frame by pet color
> pets %>% arrange(main_pet_color)
    friend        pet main_pet_color
1  Adriana        cat          black
2     Mark        cat          brown
3     Mark        cat          brown
4  Miranda        cat          brown
5    Kayla        dog          brown
6    Kayla        cat          brown
7     Kyle        cat          multi
8     Kyle        cat          multi
9     Alex        dog          multi
10   Nancy woodpecker          multi
11   Kayla     lizard         orange
12 Adriana        cat          white
13   Randy        dog          white

So why doesn’t this work with table()?? This problem has driven me crazy on several occasions. I always ended up reverting back to the table(pets$main_pet_color) syntax, but I was not happy about it.

Turns out, there’s a simple fix.

Solution

Introducing… a new pipe! %$% is called the “exposition pipe,” according to the magrittr package documentation, and it’s basically the tidy version of the with() function, which I wrote about previously.

If we simply swap out %>% for %$% in our failed code above, it works!

# Make a frequency table of pet types
> pets %$% table(main_pet_color)
main_pet_color
 black  brown  multi orange  white 
     1      5      4      1      2

Important note: Make sure you have magrittr loaded if you want to use this pipe. dplyr  includes the basic %>%, but not the other magrittr pipes.

Why it works

The traditional pipe, %>%, works by passing a data frame or tibble into the next function. But that only works if the function you’re piping to is set up to take a data frame/tibble as an argument!

Functions in the tidyverse, like arrange(), are set up to take this kind of argument, so that piping works seamlessly. But many base R functions take vectors as inputs instead.

That’s the case with table(). When we write table(pets$main_pet_color), the argument pets$main_pet_color is a vector:

# This returns a vector
> pets$main_pet_color
 [1] brown  brown  multi  multi  brown  brown  brown  orange black  white  multi  white 
[13] multi 
Levels: black brown multi orange white

When we try to pass pets into table() with the pipe, table() expects a vector but gets a data frame instead, and it throws an error.

The %$% pipe  “exposes” the column names of the data frame to the function you’re piping to, allowing that function to make sense of the data frame that is passed to it.

Outcome

The exposition pipe is great for integrating non-tidyverse functions into a tidy workflow. The outcome for me is that I can finally make frequency tables to my heart’s content, without “code switching” back from tidy to base R syntax.

Congrats, you made it to the end! Here are some more cats for you.

Resources

magrittr has a couple other pipes, too: %T% and %<>%. The package also has some nice aliases for basic arithmetic functions that allow them to be incorporated into a chain of pipes. To read more about these magrittr options, scroll to the bottom of the magrittr vignette. And maybe I’ll post about them later!

Note: The image at the top of this post was modified from the magrittr documentation.

To leave a comment for the author, please follow the link and comment on their blog: woodpeckR.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.