Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This is another situation where Fourier terms are useful for handling the seasonality. Not only is the seasonal period rather long, it is non-integer (averaging 365.25/7 = 52.18). So ARIMA and ETS models do not tend to give good results, even with a period of 52 as an approximation.
Regression with ARIMA errors
The simplest approach is a regression with ARIMA errors. Here is an example using weekly data on US finished motor gasoline products supplied (in thousands of barrels per day) from February 1991 to May 2005. An updated version of the data is available from the EIA website. I select the number of Fourier terms by minimizing the AICc. The order of the ARIMA model is also selected by minimizing the AICc, although that is done within the auto.arima()
function.
library(forecast) gas <- ts(read.csv("http://robjhyndman.com/data/gasoline.csv", header=FALSE)[,1], freq=365.25/7, start=1991+31/7/365.25) bestfit <- list(aicc=Inf) for(i in 1:25) { fit <- auto.arima(gas, xreg=fourier(gas, K=i), seasonal=FALSE) if(fit$aicc < bestfit$aicc) bestfit <- fit else break; } fc <- forecast(bestfit, xreg=fourierf(gas, K=12, h=104)) plot(fc) |
The fitted model has 12 pairs of Fourier terms and can be written as
where
TBATS
An alternative approach is the TBATS model introduced by De Livera et al (JASA, 2011). This uses a state space model that is a generalization of those underpinning exponential smoothing. It also allows for automatic Box-Cox transformation and ARMA errors. The modelling algorithm is entirely automated:
gastbats <- tbats(gas) fc2 <- forecast(gastbats, h=104) plot(fc2, ylab="thousands of barrels per day") |
(The tbats
function generates some warnings here, but it still works ok. I’ll fix the warnings in the next version.)
Here the fitted model is given at the top of the plot as TBATS(0.999, {2,2}, 1, {< 52.18,8>}). That is, a Box-Cox transformation of 0.999 (essentially doing nothing), ARMA(2,2) errors, a damping parameter of 1 (doing nothing) and 8 Fourier pairs with period
where
Which to use?
In this example, the forecasts are almost identical and there is little to differentiate the two models. The TBATS model is preferable when the seasonality changes over time, or when there are multiple seasonal periods. The ARIMA approach is preferable if there are covariates that are useful predictors as these can be added as additional regressors.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.