[This article was first published on Anything but R-bitrary, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Sangyoon LeeWant to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Background
While thinking about ways to represent incoming and outgoing flows in a business process, I thought about using export-import charts like the one shown here in the Learning R blog. However, as the author acknowledges, it is difficult to compare individual values using these charts. Regardless, I still wanted to have this graph for an at-a-glance view before breaking it into facets and comparing individual values.
My Solution
In the Learning R article, the author chooses to show multiple categories of import and export using stacked bars. Instead of representing multiple categories, I decided to use the color intensity on the bars’ fill as visual reinforcement of information the graph already contains. Import and export are represented by red and blue, respectively, and the transparency facilitates the visual comparison the reader must make between bars that are not side by side.
In the below example, I use the same subset of data as in the motivating post. Please refer to the linked article for the data used in this example. Make sure to click on the link that says “Access the subset used in this post in here.” rather than going to the Eurostat website. Save the file as “trade.csv” in the working directory. These are monthly trade data for the 27 European Union countries by broad economic categories (BEC) in millions of Euros.
First, load the necessary packages.
library(ggplot2) library(plyr) library(reshape) library(scales)
For convenient and powerful data manipulation, plyr and reshape provide functions like ddply and melt. A relatively new package, scales is required for scale functions to format the numbers to specific scales within ggplot2.
Next, import the data, calculate the trade balance (export – import), and melt the data for ggplot2.
trade <- read.csv("trade.csv", header = TRUE, stringsAsFactors = FALSE) balance <- ddply(trade, .(Time), summarise, balance = sum(EXP - IMP)) trade.m <- melt(trade, id.vars = c("BEC","Time"))
After the melt step, add another line to aggregate over BEC. This will further simplify the structure.
trade.a <- ddply(trade.m, c("Time", "variable"), summarise, value = sum(value))At this point, the data will look like this:
> head(trade.a) Time variable value 1 2008M05 EXP 273153.2 2 2008M05 IMP 260789.1 3 2008M06 EXP 284994.7 4 2008M06 IMP 273033.0 5 2008M07 EXP 284681.6 6 2008M07 IMP 271122.2
We step through one layer at a time.
Layer 1: Start with export bars. We will add import data on the bottom of this graph.
ggplot(trade.a, aes(x=Time)) + geom_bar(data = subset(trade.a, variable == "EXP"), aes(y=value, fill = value), stat = "identity")
Layer 2: Add the import data and attach it back-to-back to the export data. Label the x-axis and the y-axis accordingly.
last_plot() + geom_bar(data = subset(trade.a, variable == "IMP"), aes(y=-value, fill = -value), stat = 'identity') + scale_y_continuous(labels = comma) + xlab("") + ylab("Export - Import") + scale_fill_gradient2(low = muted("red"), mid = "white", high = muted("blue"), midpoint = 0,space = "rgb")Layer 3. Now add the balance trend line, remove the meaningless legend, and format the y-axis with commas.
last_plot() + geom_line(data = balance, aes(Time, balance, group = 1), size = 1) + geom_hline(yintercept = 0,colour = "grey90") + opts(legend.position = "none")
Layer 4: Finally, change the x-axis to make it easy for viewers to read. The following result is my final product.
labels <- gsub("20([0-9]{2})M([0-9]{2})", "\\2\n\\1",trade.m$Time) last_plot() + scale_x_discrete(labels = labels)
The resulting plot shows the overall export and import trend, with different color intensities to reinforce the size of each bar. This eases the cognitive burden placed on readers when they visually compare export versus import.
While the overall trend shows that there are more exports than imports, the story might be more complicated when there are subcategories. An example is the United States economy: an aggregated USA import-export chart will show significantly larger import bars than exports bars, but when it is broken into different categories, especially in agricultural goods, the graph will show a different story from the overall trend.
In the meantime, this graph provides a quick at-a-glance look at exports and imports before digging deeper into various categories for further analysis.
All highlighted R-code segments were created by Pretty R at inside-R.org. Keep up with ours and other great articles relating to R on R-bloggers.
References
- ggplot2: Back-to-back Bar Charts. Learning R. URL http://learnr.wordpress.com/2009/09/24/ggplot2-back-to-back-bar-charts(accessed July 23, 2012).
- Pretty R Syntax Highlighter. inside-R.org. URL http://www.inside-r.org/pretty-r (accessed July 27, 2012).
- R Development Core Team (2012). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org
- Wickham, H. (2007). Reshaping data with the reshape package. Journal of Statistical Software, 21(12).
- Wickham, H. (2009). ggplot2: elegant graphics for data analysis. Springer New York, USA.
- Wickham, H. (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29. URL http://www.jstatsoft.org/v40/i01/
- Wickham, H. (2012). scales: Scale functions for graphics. R package version 0.2.1. URL http://CRAN.R-project.org/package=scales
To leave a comment for the author, please follow the link and comment on their blog: Anything but R-bitrary.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.