magrittr: Simplifying R code with pipes
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
R is a functional language, which means that your code often contains a lot of ( parentheses ). And complex code often means nesting those parentheses together, which make code hard to read and understand. But there's a very handy R package — magrittr, by Stefan Milton Bache — which lets you transform nested function calls into a simple pipeline of operations that's easier to write and understand.
Hadley Wickham's dplyr package benefits from the %>% pipeline operator provided by magrittr. Hadley showed at useR! 2014 an example of a data transformation operation using traditional R function calls:
hourly_delay <- filter( summarise( group_by( filter( flights, !is.na(dep_delay) ), date, hour ), delay = mean(dep_delay), n = n() ), n > 10 )
Here's the same code, but rather than nesting one function call inside the next, data is passed from one function to the next using the %>% operator:
hourly_delay <- flights %>% filter(!is.na(dep_delay)) %>% group_by(date, hour) %>% summarise( delay = mean(dep_delay), n = n() ) %>% filter(n > 10)
You can read this version aloud to easily get a sense of what it does: the flights data frame is filtered (to remove missing values of the dep_delay variable), grouped by hours within days, the mean delay is calculated withn groups, and returns the mean delay for those hours with more than 10 flights.
You can use the %>% operator with standard R functions — and even your own functions — too. The rules are simple: the object on the left hand side is passed as the first argument to the function on the right hand side. So:
- my.data %>% my.function is the same as my.function(my.data)
- my.data %>% my.function(arg=value) is the same as my.function(my.data, arg=value)
It's even possible to pass in data to something other than the first argument of the function using a . (dot) operator to mark the place where the object goes — see the magrittr vignette for details.
This new “pipelining” operation is a really useful addition to the R language, and R developers are starting to use it to make their code simpler to write and maintain. Hadley Wickham's newest R package, tidyr, makes it easy to clean up data sets for analysis by stringing together operations like “gather” and “spread” using the %>% operator.
And speaking of pipelining, you may have been wondering where the name “magrittr” comes from. Here's the answer:
The only other question is: will Stefan be making this coffee mug available?
magrittr vignette: Ceci n'est pas un pipe
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.