Detecting seasonality
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I occasionally get email asking how to detect whether seasonality is present in a data set. Sometimes the period of the potential seasonality is known, but in other cases it is not.
I’ve discussed before how to estimate an unknown seasonal period, and how to measure the strength of the seasonality. In this post, I want to look at testing if a series is seasonal when the potential period is known (e.g., with quarterly, monthly, daily or hourly data).
One simple approach is to fit a model with allows for seasonality if it is present. For example, you can fit an ETS model using ets()
in R, and if the chosen model has a seasonal component, then the data is seasonal. For higher frequency data, or where the seasonal period is non-integer, a TBATS model will do much the same thing via the tbats()
function.
This is not a formal test of seasonality, as the model selection is based on the AIC rather than any hypothesis test. However, there is a related log-likelihood test based on the difference between the selected model, and the equivalent model with an additional seasonal term added. Twice the difference between the two log-likelihoods will have a chi-squared distribution according to Wilks’ theorem. The degrees of freedom will be the difference in the number of parameters being estimated in the two models.
For example, the pigs
data (Monthly number of pigs slaughted in Victoria) does not look very seasonal when plotted (see above), but the ets
function selects an ETS(A,N,A) model. That is, it detects an additive seasonal component. We can formally test the significance of the seasonal component as follows.
library(fma) fit1 <- ets(pigs) fit2 <- ets(pigs,model="ANN") deviance <- 2*c(logLik(fit1) - logLik(fit2)) df <- attributes(logLik(fit1))$df - attributes(logLik(fit2))$df #P value 1-pchisq(deviance,df) |
The resulting p-value is , so the additional seasonal component is significant.
Personally, I never bother with the hypothesis test as I think it answers the wrong question. If the hypothesis test is significant, we can conclude that the data are very unlikely to have been generated from the simpler (non-seasonal) model. But I don’t actually believe the data were generated by any ETS model, so all this is telling me is that I have enough data to be able to see the difference between my data and the model.
A more useful question is to ask if the seasonal component improves forecast accuracy, and that is precisely what the AIC is telling us. Minimizing the AIC is asymptotically equivalent to minimizing the one-step-head out-of-sample MSE. So a smaller AIC means better forecasts, and that’s what I usually care about.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.