Why time series forecasts prediction intervals aren’t as good as we’d hope
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Five different sources of error
When it comes to time series forecasts from a statistical model we have five sources of error:
- Random individual errors
- Random estimates of parameters (eg the coefficients for each autoregressive term)
- Uncertain meta-parameters (eg number of autoregressive terms)
- Unsure if the model was right for the historical data
- Even given #4, unsure if the model will continue to be right
A confidence interval is an estimate of the statistical uncertainty of the estimated parameters in the model. It usually estimates the uncertainty source #2 above, not interested in #1 and conditional on the uncertainty of sources #3, #4 and #5 all being taken out of the picture. A prediction interval should ideally take all five sources into account (see Rob Hyndman for more on the distinction between prediction and confidence intervals).
Unfortunately, the standard ways of providing time series prediction intervals typically only take source #1 into account – random individual errors. This differs from standard prediction intervals from more straightforward regression and generalized linear models, which at least usually factor in uncertainty of the estimates of parameters.
The problem is that for all but the most trivial time series forecasting method there is no simple way of estimating the uncertainty that comes from having estimated the parameters from the data, and much less so the values of meta-parameters like the amount of differencing needed, how many autoregressive terms, how many moving average terms, etc (those example meta-parameters come from the Box-Jenkins ARIMA approach, but other forecasting methods have their own meta-parameters to estimate too).
Demonstration of the cost of estimating meta-parameters
Here’s a simple simulation to show the cost of estimating the meta-parameters, even when sources of error #4 and #5 can be discounted. I generated 10,000 time series, each 100 observations long, from an ARIMA(1, 1, 1) process and split them into a 90 observation training set and 10 observation test set. This creates data that looks like this (just four examples shown):
For each one of those datasets, I fit an ARIMA model to the 90 observation test set two ways:
- using
forecast::Arima
and advising the algorithm that the model to use is (1,1,1) - using
forecast::auto.arima
and forcing the algorithm to estimate from the data the correct level of differencing, number of autoregression terms, and number of moving average terms.
Then I compared the resulting forecast prediction intervals (at 80% and 95% confidence levels) of the last 10 observations to see what percentage of forecasts actually had the true values in their range. Here’s the result:
As we can see, the Arima
model which knew in advance the correct meta-parameter specification didn’t do too badly. Its prediction intervals contain the correct values only just below the promised 80% and 95% of times. But the auto.arima
method, which had to estimate the meta-parameters itself, did noticeably worse. Not only that, as the forecast horizon increases, its prediction intervals got increasingly over-optimistic. Having the wrong meta-parameters for your model becomes more of a problem the further you go out (I suspect this is particularly the case for the amount of differencing that needs to be done – an unlucky choice here would have big implications for estimating the trend).
An important thing to remember is that out in the wild, results will generally be worse than this. In this occasion, both methods had the luxury of fitting the same family of models that had generated the data, and knowing that the final ten observations were generated the same way. No outliers, black swans, or changes in data generating process to worry about. One way of partially dealing with this problem with careful use of the method implemented in the forecastHybrid
R package by David Shaub and myself was the subject of my talk yesterday at the Australian Statistical Conference.
Code
Here’s the code that does the simulation above:
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.