Visualizing package performance using Rperform and Grammar of Graphics
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The greatest value of a picture is when it forces us to notice what we never expected to see.
―John Tukey
Replace ‘picture’ in the above quote with ‘data visualization’ and it will still ring true; maybe even more so. To provide valuable insights to package developers is exactly what Rperform strives to do through it’s visualization functions.
Background
If you are new to Rperform, consider going through it’s Github README once.
In a nutshell, Rperform is an R package that allows package developers to track and visualize quantitative performance metrics of their code, over time. It focuses on providing changes in a package’s performance metrics, related to runtime and memory usage, over different git versions and across git branches.
Visualizing package performance across two branches
As discussed in a previous blog post, data visualizations, UI and Travis-CI integration are the focal points of this year’s GSoC for Rperform.
One month into the GSoC period, most of the work that I have put in so far has been directed towards improving Rperform’s visualization capabilities. A developer can now compare and visualize performance of a package across two git branches. This can be done using the plot_branchmetrics()
function. Two key parameters which this function takes are branch1 and branch2. It’s assumed that branch1 (this might be your development branch) is to be merged into branch2 (this might be the master branch), or that branch1 originated from branch2. The relationship between such 2 branches is depicted visually in the below figure.
Following is an example from the Rperform wiki which depicts the usage of plot_branchmetrics()
on the package, stringr:
## Warning: Always set the current directory to be the root directory of the package to be tested.
Rperform::plot_branchmetrics(test_path = "tests/testthat/test-interp.r", metric = "memory", branch1 = "rperform_test", branch2 = "master", save_data = F, save_plots = F)
The commit on the left-hand side (LHS) of the vertical line in the above plot is the latest commit from the branch provided as the parameter, branch2. The right-hand side (RHS) contains the commits from the branch provided as the parameter, branch1. The commits on the RHS run from branch1‘s latest commit until the first commit common to branch2.
To know more about how to visualize your package’s performance, check out this Github Wiki.
Grammar of Graphics and Interactivity
Grammar of Graphics is a framework that coherently ties together many aspects of designing, implementing, reading, and understanding a graphic.[1] Created by Leland Wilkinson, it’s a systematic way of thinking about visualizations. ggplot2, developed by Hadley Wickham, is more or less an implementation of Wilkinson’s GoG framework. ggplot2 allows one to create a visualization in a layer-by-layer manner by associating data variables to visual properties (or aesthetics). This approach allows one to create an astounding variety of visualizations. Rperform uses ggplot2 under-the-hood to create plots such as the one shown above.
When I had started thinking about implementing interactivity in Rperform’s plots, I wanted a solution without giving up ggplot2’s capabilities and the GoG philosophy. The animint package, developed by Toby Dylan Hocking (one of my mentors), has proven to be a good fit since it works in tandem with ggplot2 to create interactive plots. This bit is still a work-in-progress but one can get a taste of how interactivity can be helpful through this example. Here, clicking on a point takes you to the github page for the commit which the point represented.
I will be writing another post after finishing implementation of the functions returning interactive visualizations. That’s all for now, folks!
Note:
If you are an R package developer, please try out Rperform on your code and provide feedback if possible. Drop me a mail, or hit me up on Twitter, Github or Quora.
If any problem arises, please open an issue on Github.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.