Site icon R-bloggers

Introducing the streamgraph htmlwidget R Package

[This article was first published on rud.is » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We were looking for a different type of visualization for a project at work this past week and my thoughts immediately gravitated towards streamgraphs. The TLDR on streamgraphs is they they are generalized versions of stacked area graphs with free baselines across the x axis. They are somewhat controversial but have a “draw you in” aesthetic appeal (which is what we needed for our visualization).

You can make streamgraphs/stacked area charts pretty easily in D3, and since we needed to try many different sets of data in the streamgraph style, it made sense to make this an R htmlwidget. Thus, the streamgraph package was born.

Making a streamgraph

The package isn’t in CRAN yet, so you have to do the devtools dance:

devtools::install_github("hrbrmstr/streamgraph")

Streamgraphs require a continuous variable for the x axis, and the streamgraph widget/package works with years or dates (support for xts objects and POSIXct types coming soon). Since they display categorical values in the area regions, the data in R needs to be in long format which is easy to do with dplyr & tidyr.

The package recognizes when years are being used and does all the necessary conversions for you. It also uses a technique similar to expand.grid to ensure all categories are represented at every observation (not doing so makes d3.stack unhappy).

Let’s start by making a streamgraph of the number of movies made per year by genre using the ggplot2 movies dataset:

library(streamgraph)
library(dplyr)
 
ggplot2::movies %>%
  select(year, Action, Animation, Comedy, Drama, Documentary, Romance, Short) %>%
  tidyr::gather(genre, value, -year) %>%
  group_by(year, genre) %>%
  tally(wt=value) %>%
  streamgraph("genre", "n", "year") %>%
  sg_axis_x(20) %>%
  sg_colors("PuOr") %>%
  sg_legend(show=TRUE, label="Genres: ")

Movie count by genre by year

We can also mimic an example from the Name Voyager project (using the babynames R package) but change some of the aesthetics, just to give an idea of how some of the options work:

library(dplyr)
library(babynames)
library(streamgraph)
 
babynames %>%
 filter(grepl("^(Alex|Bob|Jay|David|Mike|Jason|Stephen|Kymberlee|Lane|Sophie|John|Andrew|Thibault|Russell)$", name)) %>%
  group_by(year, name) %>%
  tally(wt=n) %>%
  streamgraph("name", "n", "year", offset="zero", interpolate="linear") %>%
  sg_legend(show=TRUE, label="DDSec names: ")

Data-Driven Security Podcast guest+host names by year

There are more examples over at RPubs and github, but I’ll close with a streamgraph of housing data originally made by Alex Bresler:

dat <- read.csv("http://asbcllc.com/blog/2015/february/cre_stream_graph_test/data/cre_transaction-data.csv")
 
dat %>%
  streamgraph("asset_class", "volume_billions", "year") %>%
  sg_axis_x(1, "year", "%Y") %>%
  sg_colors("PuOr") %>%
  sg_legend(show=TRUE, label="DDSec names: ")

Commercial Real Estate Transaction Volume by Asset Class Since 2006

While the radical volume change would have been noticeable in almost any graph style, it’s especially noticeable with the streamgraph version as your eyes tend to naturally follow the curves of the flow.

Fin

While I wouldn’t have these replace my trusty ggplot2 faceted bar charts for regular EDA and reporting, streamgraphs can add a bit of color and flair, and may be an especially good choice when you need to view many categorical variables over time.

As usual, issues/feature requests on github and showcase/general feedback in the comments.

To leave a comment for the author, please follow the link and comment on their blog: rud.is » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.