Five problems (and one solution) with dual-axis time series plots
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
If you need to present two time series spanning the same period, but in wildly different scales, it's tempting to use a time series chart with two separate vertical axes, one for each series, like this one from the Reserve Bank of New Zealand:
Charts like this typically have one or more crossover points, and that crossing imparts meaning to the viewer of the sense that one series is now “ahead” of the other. One problem is that crossover-points in dual-axis time series charts are entirely arbitrary. Changing either the left-hand or right-hand scale (and replotting the data accordingly) will change where the crossover points appear. And (as if often the case) the scales are automatically chosen to allow each series to use the full vertical space available, just changing the time-range of the data plotted will also change the location of the crossover points.
In an excellent blog post, statistician Peter Ellis points out five problems with dual-axis time series charts:
- The designer has to make choices about scales and this can have a big impact on the viewer
- In particular, “cross-over points” where one series cross another are results of the design choices, not intrinsic to the data, and viewers (particularly unsophisticated viewers) will not appreciate this and think there is more significance in cross over than is actually the case
- They make it easier to lazily associate correlation with causation, not taking into account autocorrelation and other time-series issues
- Because of the issues above, in malicious hands they make it possible to deliberately mislead
- They often look cluttered and aesthetically unpleasing
A simple alternative is to rescale both time series, for example to define both series to have a nominal value at a specific time, say both start at 100 on January 1, 2016. This is a useful way to compare the growth in two series since the beginning of the year, and means that both can be represented using the same single scale. (If you're using the ggplot2 package in R to plot time series, you can use the stat_index function from Peter's ggseas package to scale time series in this way.) The problem though is that you use the interpretability of the chart, having now lost the true scales for both time series.
All that being said, Peter suggests that there are times when a dual-axis chart can be appropriate, for example when the two axes are conceptually similar (as above, when both are linear monetary scales), and you use a consistent process to set the scales of the vertical axes. Other considerations include color-coding the axes for interpretability, and choosing colors that don't favor one series over the other. Implementing these best practices, Peter has created the dualplot() function for R, which cooses the axes according to a cross-over point you specify. This is equivalent to rescaling the series to have the same value at that specified points, but keeps the real-value axes for interpretability. Heres' the above chart, rendered with dualplot() with a crossover point at January 2104:
For more great discussion of the pros and cons of dual-axis time series charts, and the R code for the dualplot() function, follow the link to Peter's blog post below.
Peter's stats stuff: Dual axes time series plots may be ok sometimes after all (via Harlan Harris)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.