Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
As discussed in the MAT8181 course, there are – at least – two kinds of non-stationary time series: those with a trend, and those with a unit-root (they will be called integrated). Unit root tests cannot be used to assess whether a time series is stationary, or not. They can only detect integrated time series. And the same holds for seasonal unit root.
In a previous post, we’ve seen that it was difficult to model hourly temperature, since most test do not reject unit roots. Consider here the average monthly temperature, still in Montréal, QC.
> montreal=read.table("http://freakonometrics.free.fr/temp-montreal-monthly.txt") > M=as.matrix(montreal[,2:13]) > X=as.numeric(t(M)) > tsm=ts(X,start=1948,freq=12) > plot(tsm)
For those who don’t know Montréal, Winter and Summer are very different. We can visualize monthly differences using
> month=rep(1:12,length(tsm)/12) > plot(month,as.numeric(tsm)) > lines(1:12,apply(M,2,mean),col="red",type="b",pch=19)
or, if install the uroot package, removed from the CRAN repository, we can use
> library(uroot) > bbplot(tsm)
or
> bb3D(tsm) Loading required package: tcltk
It looks like our time series is cyclic, because of the yearly seasonal pattern. The autocorrelation function is here
> acf(tsm,lag=120)
Again, this cycle can be visualized using
> persp(1948:2013,1:12,M,theta=-50,col="yellow",shade=TRUE, + xlab="Year",ylab="Month",zlab="Temperature",ticktype="detailed")
Now, the question is is there a seasonal unit root ? This would mean that our model should be
If we forget about the autoregressive and the moving average component, we can estimate
If there is a seasonal unit root then
> arima(tsm,order=c(0,0,0),seasonal=list(order=c(1,0,0),period=12)) Call: arima(x = tsm, order = c(0, 0, 0), seasonal = list(order = c(1, 0, 0), period = 12)) Coefficients: sar1 intercept 0.9702 6.4566 s.e. 0.0071 2.1515
It is not far away from 1. Actually, it cannot be too close to 1. If it was, then we would get an error message…
To illustrate some interesting models, let us consider also quarterly temperatures,
> N=cbind(apply(montreal[,2:4],1,sum),apply(montreal[,5:7],1,sum),apply(montreal[,8:10],1,sum),apply(montreal[,11:13],1,sum)) > X=as.numeric(t(N)) > tsq=ts(X,start=1948,freq=4) > persp(1948:2013,1:4,N,theta=-50,col="yellow",shade=TRUE, + xlab="Year",ylab="Quarter",zlab="Temperature",ticktype="detailed")
(again, the aim is just to be able to write down some equations, if necessary)
Why not consider a
i.e.
where
> library(vars) > df=data.frame(N) > names(df)=paste("y",1:4,sep="") > model=VAR(df) > model VAR Estimation Results: ======================= Estimated coefficients for equation y1: ======================================= Call: y1 = y1.l1 + y2.l1 + y3.l1 + y4.l1 + const y1.l1 y2.l1 y3.l1 y4.l1 const -0.13943065 0.21451118 0.08921237 0.30362065 -34.74793931 Estimated coefficients for equation y2: ======================================= Call: y2 = y1.l1 + y2.l1 + y3.l1 + y4.l1 + const y1.l1 y2.l1 y3.l1 y4.l1 const 0.02520938 0.05288958 -0.13277377 0.05134148 40.68955266 Estimated coefficients for equation y3: ======================================= Call: y3 = y1.l1 + y2.l1 + y3.l1 + y4.l1 + const y1.l1 y2.l1 y3.l1 y4.l1 const 0.07740824 -0.21142726 0.11180518 0.12963931 56.81087283 Estimated coefficients for equation y4: ======================================= Call: y4 = y1.l1 + y2.l1 + y3.l1 + y4.l1 + const y1.l1 y2.l1 y3.l1 y4.l1 const 0.18842863 -0.31964579 0.25099508 -0.04452577 5.73228873
and matrix
> A=rbind( + coefficients(model$varresult$y1)[1:4], + coefficients(model$varresult$y2)[1:4], + coefficients(model$varresult$y3)[1:4], + coefficients(model$varresult$y4)[1:4]) > A y1.l1 y2.l1 y3.l1 y4.l1 [1,] -0.13943065 0.21451118 0.08921237 0.30362065 [2,] 0.02520938 0.05288958 -0.13277377 0.05134148 [3,] 0.07740824 -0.21142726 0.11180518 0.12963931 [4,] 0.18842863 -0.31964579 0.25099508 -0.04452577
Since stationary of this multiple time series is closely related to eigenvalues of this matrix, let us look at them,
> eigen(A)$values [1] 0.35834830 -0.32824657 -0.14042175 0.09105836 > Mod(eigen(A)$values) [1] 0.35834830 0.32824657 0.14042175 0.09105836
So it looks like there is no stationarity issue, here. A restricted model is the periodic autoregressive model, so called
where
and
Keep in mind that this is a
This model can be estimated using a specific package (one can also look at the vignette, to get a better understanding of the syntax)
> library(partsm) > detcomp <- list(regular=c(0,0,0), seasonal=c(1,0), regvar=0) > model=fit.ar.par(wts=tsq, detcomp=detcomp, type="PAR", p=1) > PAR.MVrepr(model) ---- Multivariate representation of a PAR model. Phi0: 1.000 0.000 0.000 0 -0.242 1.000 0.000 0 0.000 -0.261 1.000 0 0.000 0.000 -0.492 1 Phi1: 0 0 0 0.314 0 0 0 0.000 0 0 0 0.000 0 0 0 0.000 Eigen values of Gamma = Phi0^{-1} %*% Phi1: 0.01 0 0 0 Time varing accumulation of shocks: 0.010 0.040 0.155 0.314 0.002 0.010 0.037 0.076 0.001 0.003 0.010 0.020 0.000 0.001 0.005 0.010
Here, the characteristic equation is
so there is a (seasonal) unit root if
Which is clearly not the case, here. It is possible to perform Canova-Hansen test,
> CH.test(tsq) ------ - ------ ---- Canova & Hansen test ------ - ------ ---- Null hypothesis: Stationarity. Alternative hypothesis: Unit root. Frequency of the tested cycles: pi/2 , pi , L-statistic: 1.122 Lag truncation parameter: 5 Critical values: 0.10 0.05 0.025 0.01 0.846 1.01 1.16 1.35
The idea is that polynomial
since
If we get back to monthly data,
each of them having different interpretations.
Here we can have 1 cycle per year (on 12 months), 2 cycles per year (on 6 months), 3 cycles per year (on 4 months), 4 cycles per year (on 3 months), even 6 cycles per year (on 2 months). This will depend on the argument of the root, with respectively
The output of the test is here
> CH.test(tsm) ------ - ------ ---- Canova & Hansen test ------ - ------ ---- Null hypothesis: Stationarity. Alternative hypothesis: Unit root. Frequency of the tested cycles: pi/6 , pi/3 , pi/2 , 2pi/3 , 5pi/6 , pi , L-statistic: 1.964 Lag truncation parameter: 20 Critical values: 0.10 0.05 0.025 0.01 2.49 2.75 2.99 3.27
And it looks like we reject the assumption of a seasonal unit root. I can even mention the following testing procedure
> library(forecast) > nsdiffs(tsm, test="ch") [1] 0
where the ouput: “1″ means that there is a seasonal unit root and “0″ that there is no seasonal unit root. Simple to read, isn’t it? If we consider the periodic autoregressive model on the monthly data, the output is
> model=fit.ar.par(wts=tsm, detcomp=detcomp, type="PAR", p=1) > model ---- PAR model of order 1 . y_t = alpha_{1,s}*y_{t-1} + alpha_{2,s}*y_{t-2} + ... + alpha_{p,s}*y_{t-p} + coeffs*detcomp + epsilon_t, for s=1,2,...,12 ---- Autoregressive coefficients. s=1 s=2 s=3 s=4 s=5 s=6 s=7 s=8 s=9 s=10 s=11 s=12 alpha_1s 0.15 0.05 0.07 0.33 0.11 0 0.3 0.38 0.31 0.19 0.15 0.37
So, whatever the test, we always reject the assumption that there is a seasonal unit root. Which does not mean that we can not have a strong cycle! Actually, the series is almost periodic. But there is no unit root! So all of this makes sense (I hardly believe that there might be unit root – seasonal, or not – in temperatures).
Just to make sure that we get it right, consider two times series, inspired from the previous one. The first one is a periodic sequence (with a very very small noise, just to avoid problem of non-definite matrices) and the second one is clearly integrated.
> Xp1=Xp2=as.numeric(t(M)) > for(t in 13:length(M)){ + Xp1[t]=Xp1[t-12] + Xp2[t]=Xp2[t-12]+rnorm(1,0,2) + } > Xp1=Xp1+rnorm(length(Xp1),0,.02) > tsp1=ts(Xp1,start=1948,freq=12) > tsp2=ts(Xp2,start=1948,freq=12) > par(mfrow=c(2,1)) > plot(tsp1) > plot(tsp2)
see also
> par(mfrow=c(1,2)) > bb3D(tsp1) > bb3D(tsp2)
If we quickly look at those series, I would say that the first one has no unit root – even if it is not stationary, but it is because the series is periodic – while there is (are ?) unit root(s) for the second one. If we look at Canova-Hansen test, we get
> CH.test(tsp1) ------ - ------ ---- Canova & Hansen test ------ - ------ ---- Null hypothesis: Stationarity. Alternative hypothesis: Unit root. Frequency of the tested cycles: pi/6 , pi/3 , pi/2 , 2pi/3 , 5pi/6 , pi , L-statistic: 2.234 Lag truncation parameter: 20 Critical values: 0.10 0.05 0.025 0.01 2.49 2.75 2.99 3.27 > CH.test(tsp2) ------ - ------ ---- Canova & Hansen test ------ - ------ ---- Null hypothesis: Stationarity. Alternative hypothesis: Unit root. Frequency of the tested cycles: pi/6 , pi/3 , pi/2 , 2pi/3 , 5pi/6 , pi , L-statistic: 5.448 Lag truncation parameter: 20 Critical values: 0.10 0.05 0.025 0.01 2.49 2.75 2.99 3.27
I know that this package has been removed, so maybe I should not use it. Consider instead
> nsdiffs(tsp1, 12,test="ch") [1] 0 > nsdiffs(tsp2, 12,test="ch") [1] 1
Here we have the same conclusion. The first one does not have a unit root, but the second one has. But be careful: with Osborn-Chui-Smith-Birchenhall test,
> nsdiffs(tsp1, 12,test="ocsb") [1] 1 > nsdiffs(tsp2, 12,test="ocsb") [1] 1
we have the feeling that there is also a unit root in our cyclic series…
So here, on a short-frequency, we do reject the assumption of a unit root – even a seasonal one – on our temperature series. We still have our high-frequency problem to deal with, some day (but I don’t think I’ll have enough time to introduce long range dependence this session, unfortunately).
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.