Site icon R-bloggers

Self-documenting {ggplot}s thanks to the power of monads!

[This article was first published on Econometrics and Free Software, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Hey kid, fancy some self-documenting {ggplots} like this one:

Just read on!

I’ve been working hard on a package that I’ve called {chronicler} (read my post on it here) which allows you to attach a log to the objects you create, thus making it easy to know how some data (for example) has been created. Here’s a quick example and intro to the main features:

suppressPackageStartupMessages(
  library(dplyr)
)
library(chronicler)

# record() decorates functions so they provide enriched output
r_group_by <- record(group_by)
r_select <- record(select)
r_summarise <- record(summarise)
r_filter <- record(filter)

output_pipe <- starwars %>%
  r_select(height, mass, species, sex) %>=% # <- this is a special pipe operator to handle `chronicle` objects
  r_group_by(species, sex) %>=%
  r_filter(sex != "male") %>=%
  r_summarise(mass = mean(mass, na.rm = TRUE))

output_pipe not only has the result of all the {dplyr} operations, but also carries a log with it. Let’s print the object:

output_pipe
## OK! Value computed successfully:
## ---------------
## Just
## # A tibble: 9 × 3
## # Groups:   species [9]
##   species    sex              mass
##   <chr>      <chr>           <dbl>
## 1 Clawdite   female           55  
## 2 Droid      none             69.8
## 3 Human      female           56.3
## 4 Hutt       hermaphroditic 1358  
## 5 Kaminoan   female          NaN  
## 6 Mirialan   female           53.1
## 7 Tholothian female           50  
## 8 Togruta    female           57  
## 9 Twi'lek    female           55  
## 
## ---------------
## This is an object of type `chronicle`.
## Retrieve the value of this object with pick(.c, "value").
## To read the log of this object, call read_log(.c).

Accessing the value is possible with pick("value"):

pick(output_pipe, "value")
## # A tibble: 9 × 3
## # Groups:   species [9]
##   species    sex              mass
##   <chr>      <chr>           <dbl>
## 1 Clawdite   female           55  
## 2 Droid      none             69.8
## 3 Human      female           56.3
## 4 Hutt       hermaphroditic 1358  
## 5 Kaminoan   female          NaN  
## 6 Mirialan   female           53.1
## 7 Tholothian female           50  
## 8 Togruta    female           57  
## 9 Twi'lek    female           55

and you can read the log with read_log():

read_log(output_pipe)
## [1] "Complete log:"                                                                  
## [2] "OK! select(height,mass,species,sex) ran successfully at 2022-05-15 17:10:43"    
## [3] "OK! group_by(species,sex) ran successfully at 2022-05-15 17:10:43"              
## [4] "OK! filter(sex != \"male\") ran successfully at 2022-05-15 17:10:43"            
## [5] "OK! summarise(mean(mass, na.rm = TRUE)) ran successfully at 2022-05-15 17:10:43"
## [6] "Total running time: 0.0434844493865967 secs"

If you want to understand how this works, I suggest you read the blog post I linked above but also this one, where I explain the nitty gritty, theoretical details behind what {chronicler} does. To make a long story short, {chronicler} uses an advanced functional programming concept called a monad. And using the power of monads, I can now make self-documenting {ggplot2} graphs.

The idea was to be able to build a plot in a way similar to how I built that dataset just above, and have a log of how it was created attached to it. The issue is that the function that transforms functions to chronicler functions, record(), does not work on {ggplot2} functions.

This is because the way you create {ggplot2} graphs is by adding layers on top of each other:

library(ggplot2)

ggplot(mtcars) +
  geom_point(aes(mpg, hp))

The + here acts as a way to “add” the geom_point(mpg, hp) layer on top of the ggplot(mtcars) layer. I remember reading some tweets, quite some time ago, from people asking why %>% couldn’t work with {ggplot2} and if Hadley Wickham, the developer of {ggplot2}, was considering making something like this work:

ggplot(mtcars) %>%
  geom_point(aes(mpg, hp))

because people kept forgetting using + and kept using %>%. The thing is, %>% and + do very different things. %>% takes its first argument and passes it as the first argument of its second argument, in other words this:

a %>% f(b)

is exactly the same as:

f(a, b)

This is not what {ggplot2} functions do. When you call + on {ggplot2} objects, this is NOT what happens:

geom_point(ggplot(mtcars), aes(mpg, hp))

So that’s why %>% cannot be used with {ggplot2} functions, and that’s also why the functions I developed in {chronicle} could not handle {ggplot2} functions either. So I had to provide new functions. The first function I developed is called ggrecord() and it decorates {ggplot2} functions:

r_ggplot <- ggrecord(ggplot)
r_geom_point <- ggrecord(geom_point)
r_labs <- ggrecord(labs)

Now the output of these functions are not ggplot objects anymore, but chronicle objects. So to make layering them possible, I also needed to rewrite +. I called my rewritten + like this: %>+%:

a <- r_ggplot(mtcars) %>+%
  r_geom_point(aes(y = mpg, x = hp)) %>+%
  r_labs(title = "Self-documenting ggplot!\nLook at the bottom right",
         caption = "This is an example caption")

Let’s first take a look at a:

a
## OK! Ggplot computed successfully:
## ---------------
## Just

## 
## ---------------
## This is an object of type `chronicle`.
## Retrieve the value of this object with pick(.c, "value").
## To read the log of this object, call read_log(.c).

As before expected, a is now an object of type {chronicle}, where its “value” is a ggplot object. But where is the self-documenting part? For this, you use the last piece of the puzzle, document_gg():

document_gg(a)
## OK! Ggplot computed successfully:
## ---------------
## Just

## 
## ---------------
## This is an object of type `chronicle`.
## Retrieve the value of this object with pick(.c, "value").
## To read the log of this object, call read_log(.c).

The caption now contains the log of the plot, making it easily reproducible!

This is still in very early development, but if you want to try it out, you’ll need to try the dev branch of the package.

Any feedback, comments, ideas, pull requests, more than welcome.

Hope you enjoyed! If you found this blog post useful, you might want to follow me on twitter for blog post updates and buy me an espresso or paypal.me, or buy my ebook on Leanpub. You can also watch my videos on youtube. So much content for you to consoom!

  • Buy me an Espresso
    To leave a comment for the author, please follow the link and comment on their blog: Econometrics and Free Software.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.