ggplot2: Waterfall Charts
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Waterfall charts are often used for analytical purposes in the business setting to show the effect of sequentially introduced negative and/or positive values. Sometimes waterfall charts are also referred to as cascade charts.
In the next few paragraphs I will show how to plot a waterfall chart using ggplot2.
Data
A very small fictional dataset depicting the changes to a company cash position, found in a blogpost showing how to prepare a waterfall chart in Tableau.
> balance <- data.frame(desc = c("Starting Cash", + "Sales", "Refunds", "Payouts", "Court Losses", + "Court Wins", "Contracts", "End Cash"), amount = c(2000, + 3400, -1100, -100, -6600, 3800, 1400, 2800)) > balance desc amount 1 Starting Cash 2000 2 Sales 3400 3 Refunds -1100 4 Payouts -100 5 Court Losses -6600 6 Court Wins 3800 7 Contracts 1400 8 End Cash 2800 |
In order to preserve the order of the lines in a dataframe I convert the desc variable to a factor; id and type variable are also added:
> balance$desc <- factor(balance$desc, levels = balance$desc) > balance$id <- seq_along(balance$amount) > balance$type <- ifelse(balance$amount > 0, "in", + "out") > balance[balance$desc %in% c("Starting Cash", "End Cash"), + "type"] <- "net" |
Next the data will be slightly reworked to specify the coordinates for drawing the waterfall bars.
> balance$end <- cumsum(balance$amount) > balance$end <- c(head(balance$end, -1), 0) > balance$start <- c(0, head(balance$end, -1)) > balance <- balance[, c(3, 1, 4, 6, 5, 2)] > balance id desc type start end amount 1 1 Starting Cash net 0 2000 2000 2 2 Sales in 2000 5400 3400 3 3 Refunds out 5400 4300 -1100 4 4 Payouts out 4300 4200 -100 5 5 Court Losses out 4200 -2400 -6600 6 6 Court Wins in -2400 1400 3800 7 7 Contracts in 1400 2800 1400 8 8 End Cash net 2800 0 2800 |
Plotting
Now everything is set to plot the first waterfall chart. geom_rect is used to draw the rectangles using the coordinates calculated in the previous step.
> library(ggplot2) |
> ggplot(balance, aes(desc, fill = type)) + geom_rect(aes(x = desc, + xmin = id - 0.45, xmax = id + 0.45, ymin = end, + ymax = start)) |
The fill mapping could use some tweaking (my preference is to have outflows in red, inflows in green, and net position in blue), for that I change the order of the underlying factor levels.
> balance$type <- factor(balance$type, levels = c("out", + "in", "net")) |
Almost ready, one more tweak to the x-axis labels: the helper function below replaces spaces with new lines, making the labels more readable.
> strwr <- function(str) gsub(" ", "\n", str) |
> (p1 <- ggplot(balance, aes(fill = type)) + geom_rect(aes(x = desc, + xmin = id - 0.45, xmax = id + 0.45, ymin = end, + ymax = start)) + scale_y_continuous("", formatter = "comma") + + scale_x_discrete("", breaks = levels(balance$desc), + labels = strwr(levels(balance$desc))) + + opts(legend.position = "none")) |
Finally, the bar labels are also added (the conditional positioning of them is quite a lengthy process, as you can see).
> p1 + geom_text(subset = .(type == "in"), aes(id, + end, label = comma(amount)), vjust = 1, size = 3) + + geom_text(subset = .(type == "out"), aes(id, + end, label = comma(amount)), vjust = -0.3, + size = 3) + geom_text(data = subset(balance, + type == "net" & id == min(id)), aes(id, end, + colour = type, label = comma(end), vjust = ifelse(end < + start, 1, -0.3)), size = 3.5) + geom_text(data = subset(balance, + type == "net" & id == max(id)), aes(id, start, + colour = type, label = comma(start), vjust = ifelse(end < + start, -0.3, 1)), size = 3.5) |
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.