Seasonal, or periodic, time series

arthur charpentier

8 years ago

[This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Monday, in our MAT8181 class, we’ve discussed seasonal unit roots from a practical perspective (the theory will be briefly mentioned in a few weeks, once we’ve seen multivariate models). Consider some time series , for instance traffic on French roads,

> autoroute=read.table(
+ "http://freakonometrics.blog.free.fr/public/data/autoroute.csv",
+ header=TRUE,sep=";")
> X=autoroute$a100
> T=1:length(X)
> plot(T,X,type="l",xlim=c(0,120))
> reg=lm(X~T)
> abline(reg,col="red")

As discussed in a previous post, if there is a trend, we should remove it, and work on the residual

> Y=residuals(reg)
> acf(Y,lag=36,lwd=3)

We can observe that there is some seasonality, here. A first strategy might be to assume that there is a seasonal unit root, so we consider , and we try to find some ARMA process. Consider the empirical autocorrelation function of that time series,

> Z=diff(Y,12)
> acf(Z,lag=36,lwd=3)

or the partial autocorrelation function

> pacf(Z,lag=36,lwd=3)

The first graph might suggest a MA(1) structure, while the second graph might suggest an AR(1) time series. Let us try both.

> model1=arima(Z,order=c(0,0,1))
> model1

Call:
arima(x = Z, order = c(0, 0, 1))

Coefficients:
          ma1  intercept
      -0.2367  -583.7761
s.e.   0.0916   254.8805

sigma^2 estimated as 8071255:  log likelihood = -684.1,  aic = 1374.2

> E1=residuals(model1)
> acf(E1,lag=36,lwd=3)

which can be considered as a white noise (if you’re not convinced, try either Box-Pierce or Ljung-Box test). Similarly,

> model2=arima(Z,order=c(1,0,0))
> model2

Call:
arima(x = Z, order = c(1, 0, 0))

Coefficients:
          ar1  intercept
      -0.3214  -583.0943
s.e.   0.1112   248.8735

sigma^2 estimated as 7842043:  log likelihood = -683.07,  aic = 1372.15

> E2=residuals(model2)
> acf(E2,lag=36,lwd=3)

which can be also considered as a white noise. So what we have, so far is

for some white noise . This suggest the following SARIMA structure on ,

> model2b=arima(Y,order=c(1,0,0),
+               seasonal = list(order = c(0, 1, 0),
+               period=12)) 
> model2b

Call:
arima(x = Y, order = c(1, 0, 0), seasonal = list(order = c(0, 1, 0), period = 12))

Coefficients:
          ar1
      -0.2715
s.e.   0.1130

sigma^2 estimated as 8412999:  log likelihood = -685.62,  aic = 1375.25

So far, so good. Now, what if we consider that we do not have a seasonal unit root, but simply a large autoregressive coefficient in some AR structure. Let us try something like

where a natural guess is that this coefficient should – probably – be close to one. Let us try this one,

> model3c=arima(Y,order=c(1,0,0),
+               seasonal = list(order = c(1, 0, 0), 
+               period = 12))
> model3c

Call:
arima(x = Y, order = c(1, 0, 0), seasonal = list(order = c(1, 0, 0), period = 12))

Coefficients:
          ar1    sar1  intercept
      -0.1629  0.9741  -684.9455
s.e.   0.1170  0.0115  3064.4040

sigma^2 estimated as 8406080:  log likelihood = -816.11,  aic = 1640.21

which is comparable with what we got previously (somehow), so we might assume that this model can be considered as an interesting one. We will discuss further the fact that the first coefficient might be considered as non-significant.

What is the difference from those two models? With a short term horizon, the two models are comparable. Clearly

> library(forecast)
> previ=function(model,h=36,b=40000){
+ prev=forecast(model,h)
+ T=1:85
+ Tfutur=86:(85+h)
+ plot(T,Y,type="l",xlim=c(0,85+h),ylim=c(-b,b))
+ polygon(c(Tfutur,rev(Tfutur)),c(prev$lower[,2],rev(prev$upper[,2])),col="orange",border=NA)
+ polygon(c(Tfutur,rev(Tfutur)),c(prev$lower[,1],rev(prev$upper[,1])),col="yellow",border=NA)
+ lines(prev$mean,col="blue")
+ lines(Tfutur,prev$lower[,2],col="red")
+ lines(Tfutur,prev$upper[,2],col="red")
+ }

Now, on a (very) long term perspective, the models are quite different: one is stationnary, so the forecast will tend to the average value (here 0, since the trend was removed), while the other one is (seasonaly) integrated, so the confidence interval will increase. For the non stationry, we get

> previ(model2b,600,b=60000)

and for the stationary one

> previ(model3c,600,b=60000)

But as mentioned in the introduction of this course, forecasts with those models are relevent only for short-term horizon (say not too large). And in that case, the prediction is almost the same here,

> previ(model2b,36,b=60000)

> previ(model3c,36,b=60000)

Now, if we come back on our second model, we did mention previously that the autoregressive coefficient might be considered as non-significant. What if we remove it?

> model3d=arima(Y,order=c(0,0,0),
+               seasonal = list(order = c(1, 0, 0), 
+               period = 12))
> (model3d)

Call:
arima(x = Y, order = c(0, 0, 0), seasonal = list(order = c(1, 0, 0), period = 12))

Coefficients:
        sar1  intercept
      0.9662  -696.5661
s.e.  0.0134  3182.3017

sigma^2 estimated as 8918630:  log likelihood = -817.03,  aic = 1640.07

If we look at a (short-term) forecast, we get

> previ(model3d,36,b=32000)

Do you see any difference? To be honest, I don’t… If we look at the figures, we get

> cbind(forecast(model2b,12)$mean,forecast(model3c,12)$mean,forecast(model3d,12)$mean)
Time Series:
Start = 86 
End = 97 
Frequency = 1 
1   -4908.4920  -5092.8999  -5520.8780
2  -10012.7837  -9640.8103  -9493.0339
3   -3880.2202  -3841.1960  -3828.2611
4  -18102.5211 -17638.4086 -17499.1828
5  -20602.7346 -20090.9117 -19934.1066
6  -10463.2212 -10209.0139 -10132.0439
7    2458.1538   2376.4897   2351.2377
8   -1680.3342  -1654.4844  -1647.0057
9     876.6837    836.2342    823.4934
10  18046.5642  17561.6520  17413.1463
11  21531.4820  20956.3451  20780.2836
12  -3217.6103  -3152.0446  -3132.4112

Figures are different, but not significantly (keep in mind the size of the confidence interval). This might explain why, in R, when we ask for an autoregressive process or order , then we get a model with parameters to estimate, and even if some are not significant, we usually keep them for the forecast. Most of the time, from forecasting point of view, it’s no big deal.

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics » R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.