Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In a post a few months ago I built a new ggplot2 statistical transformation (“stat”) to provide X13-SEATS-ARIMA seasonal adjustment on the fly. With certain exploratory workflows this can be useful, saving on a step of seasonally adjusting many different slices and dices of a multi-dimensional time series during data reshaping, and just drawing it for you as part of the graphics definition process.
That code no longer works – one of the hazards of the fast changing data science environment – since the release of ggplot2 version 2.0 just before Christmas. ggplot2 v.2 introduced a whole new system of object oriented programming, ggproto, replacing the old approach that was based on proto. The good news is that the new approach is easy to use and well documented, and it’s been so successful in encouraging development based on ggplot that there’s now a whole website just to keep track of packages extending ggplot.
So it seemed time to wrap my little convenience function for seasonal adjustment into an R package which duly has its home on GitHub. Like my similar posts on this, this work depends on Christophe Sax’s {seasonal} R package, and of course the US Census Bureau’s X13-SEATS-ARIMA software.
X13-SEATS-ARIMA
Here’s how it works, applying SEATS seasonal adjustment with the default outlier etc settings to the classic international Air Passengers example dataset. The original unadjusted data are in pale grey in the background, and the seasonally adjusted line is the dark one:
# installation if not already done library(devtools) install_github("ellisp/ggseas/pkg") library(ggseas) ggplot(ap_df, aes(x = x, y = y)) + geom_line(colour = "grey80") + stat_seas(start = c(1949, 1), frequency = 12) + ggtitle("SEATS seasonal adjustment - international airline passengers") + ylab("International airline passengers per month")
Extra arguments can get passed through to X13 via the x13_params argument, which takes a list of arguments that are passed through to the “list” argument to seas():
ggplot(ap_df, aes(x = x, y = y)) + geom_line(colour = "grey80") + stat_seas(start = c(1949, 1), frequency = 12, x13_params = list(x11 = "", outlier = NULL)) + ggtitle("X11 seasonal adjustment - international airline passengers") + ylab("International airline passengers per month")
The reason for doing this with ggplot2 of course is that we can have not just univariate series, but data split by dimensions mapped to facets, colour or other aesthetics. This works very naturally, with stat_seas working just like other stats and geoms in the ggplot2 universe:
p3 <- ggplot(ldeaths_df, aes(x = YearMon, y = deaths, colour = sex)) + geom_point(colour = "grey50") + geom_line(colour = "grey50") + facet_wrap(~sex) + stat_seas(start = c(1974, 1), frequency = 12, size = 2) + ggtitle("Seasonally adjusted lung deaths in the UK 1974 - 1979") + ylab("Deaths") + xlab("(light grey shows original data;ncoloured line is seasonally adjusted)") + theme(legend.position = "none") print(p3) # note - the call to seas() isn't run until each and every time you *print* the plot
More basic seasonal decomposition
I included two alternative ways of doing seasonal adjustment, more for completion than anything else. Here’s one demo with Loess-based seasonal decomposition, using the stl() function from the {stats} package:
ggplot(ap_df, aes(x = x, y = y)) + stat_stl(frequency = 12, s.window = 7)
This sort of seasonal decomposition is quicker to fit than X13-SEATS-ARIMA but not good enough for (for example) publishing official statistics. However, it might be useful in an exploratory workflow when you don’t want the computation overhead of fitting ARIMA models. This package is aiming to help is that sort of quick analysis.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.