Plot Multiple Time Series using the flow / inkblot / river / ribbon / volcano / hourglass / area / whatchamacallit plots ~ blue whale catch per country w/ ggplot2
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Ever since I first looked at this NYT visualization by Amanda Cox, I’ve always wanted to reproduce this in R. This is a plot that stacks multiple time series onto one another, with the width of the river/ribbon/hourglass representing the strength at each time. The NYT article used box office revenue as the width of the river. It’s also an interactive web app. thanks to some help from graphic designers.
AFAIK, ggplot2 can stack area plots using geom_area
or create flow plots for one set of data using geom_ribbon
, but not both. So I created a function that creates the necessary transformed data to use in geom_polygon
.
I used blue whale catch data from Masaaki Ishida to illustrate my function. The location of the river along the y-axis is centered around the mean at each time. The data is also smoothed over so it looks nicer.
Some links that may be helpful:
- Stack Overflow on reshaping data frames using melt(), cast(), reshape()
- Hadley Wickham on ribbon plots, stacked area plots, density plots
- Lee Byron on the NYT graphic
- Edward Tufte on the NYT graphic
- Creating inkblot charts, Stack Overflow
- Junk Charts on the NYT graphic
- information aesthetics on the NYT graphic
- Havre, S., Hetzler, E., Whitney, P., & Nowell, L. (2002). ThemeRiver: visualizing thematic changes in large document collections IEEE Transactions on Visualization and Computer Graphics, 8 (1), 9-20 DOI: 10.1109/2945.981848
(messy) R Code:
# data: Masaaki Ishida ([email protected]) # http://luna.pos.to/whale/sta.html head(blue, 2) ## Season Norway U.K. Japan Panama Denmark Germany U.S.A. Netherlands ## ## [1,] 1931 0 6050 0 0 0 0 0 0 ## ## [2,] 1932 10128 8496 0 0 0 0 0 0 ## ## U.S.S.R. South.Africa TOTAL ## ## [1,] 0 0 6050 ## ## [2,] 0 0 18624 hourglass.plot <- function(df) { stack.df <- df[,-1] stack.df <- stack.df[,sort(colnames(stack.df))] stack.df <- apply(stack.df, 1, cumsum) stack.df <- apply(stack.df, 1, function(x) sapply(x, cumsum)) stack.df <- t(apply(stack.df, 1, function(x) x - mean(x))) # use this for actual data ## coords.df <- data.frame(x = rep(c(df[,1], rev(df[,1])), times = dim(stack.df)[2]), y = c(apply(stack.df, 1, min), as.numeric(apply(stack.df, 2, function(x) c(rev(x),x)))[1:(length(df[,1])*length(colnames(stack.df))*2-length(df[,1]))]), id = rep(colnames(stack.df), each = 2*length(df[,1]))) ## qplot(x = x, y = y, data = coords.df, geom = "polygon", color = I("white"), fill = id) # use this for smoothed data density.df <- apply(stack.df, 2, function(x) spline(x = df[,1], y = x)) id.df <- sort(rep(colnames(stack.df), each = as.numeric(lapply(density.df, function(x) length(x$x))))) density.df <- do.call("rbind", lapply(density.df, as.data.frame)) density.df <- data.frame(density.df, id = id.df) smooth.df <- data.frame(x = unlist(tapply(density.df$x, density.df$id, function(x) c(x, rev(x)))), y = c(apply(unstack(density.df[,2:3]), 1, min), unlist(tapply(density.df$y, density.df$id, function(x) c(rev(x), x)))[1:(table(density.df$id)[1]+2*max(cumsum(table(density.df$id))[-dim(stack.df)[2]]))]), id = rep(names(table(density.df$id)), each = 2*table(density.df$id))) qplot(x = x, y = y, data = smooth.df, geom = "polygon", color = I("white"), fill = id) } hourglass.plot(blue[,-12]) + opts(title = c("Blue Whale Catch"))
Filed under: ggplot2, R, Whaling
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.