Inference for AR(p) Time Series

arthur charpentier

8 years ago

[This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Consider a (stationary) autoregressive process, say of order 2,

for some white noise with variance . Here is a code to generate such a process,

> phi1=.25
> phi2=.7
> n=1000
> set.seed(1)
> e=rnorm(n)
> Z=rep(0,n)
> for(t in 3:n) Z[t]=phi1*Z[t-1]+phi2*Z[t-2]+e[t]
> Z=Z[800:1000]
> n=length(Z)
> plot(Z,type="l")

Here, we have to estimate two sets of parameters: the autoregressive coefficients, and the variance of the innovation process . Several techniques can be used to estimate those parameters.

using least square regression

A natural idea is to see here a regression model, since (if we consider a matrix formulation)

Here we can run (conditional) ordinary least squares estimation,

> base=data.frame(Y=Z[3:n],X1=Z[2:(n-1)],X2=Z[1:(n-2)])
> regression=lm(Y~0+X1+X2,data=base)
> summary(regression)

Call:
lm(formula = Y ~ 0 + X1 + X2, data = base)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.0268 -0.7063  0.1065  0.6925  3.2566 

Coefficients:
   Estimate Std. Error t value Pr(>|t|)    
X1  0.23400    0.05463   4.283 2.88e-05 ***
X2  0.62863    0.05476  11.479  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.062 on 197 degrees of freedom
Multiple R-squared:  0.6349,	Adjusted R-squared:  0.6312 
F-statistic: 171.3 on 2 and 197 DF,  p-value: < 2.2e-16

so we get the following estimators, for the autocorrelation coefficients, and the volatility of the noise

> regression$coefficients
       X1        X2 
0.2339959 0.6286321 
> summary(regression)$sigma
[1] 1.061839

using Yule-Walker equations

As we’ve seen in class, we can easily get the following equations for the autocovariance functions,

which can also be written (again, using a matrix expression)

So we just have to solve a simple linear system of equations. Note that if we divide by the variance, those equations can be written in terms of the autocorrelation functions

The code is the following

> rho1=cor(Z[1:(n-1)],Z[2:n])
> rho2=cor(Z[1:(n-2)],Z[3:n])
> A=matrix(c(1,rho1,rho1,1),2,2)
> b=matrix(c(rho1,rho2),2,1)
> (PHI=solve(A,b))
          [,1]
[1,] 0.2256270
[2,] 0.6315329

Now, we need to extract the estimated innovation process, from this set of parameters

> estWN=base$Y-(PHI[1]*base$X1+PHI[2]*base$X2)
> sd(estWN)
[1] 1.058558

This estimator is probably not the best one (we can take into account that we’ve lost two degrees of freedom), but as a starting point, let us consider this one.

An alternative could be to include the variance term in Yule-Walker equations, to get a three dimensional linear equation,

It is not much more complicated to solve, actually,

> gamma0=var(Z[1:n])
> gamma1=var(Z[1:(n-1)],Z[2:n])
> gamma2=var(Z[1:(n-2)],Z[3:n])
> A=matrix(c(gamma1,gamma0,gamma1,gamma2,gamma1,gamma0,1,0,0),3,3)
> b=matrix(c(gamma0,gamma1,gamma2),3,1)
> (PHISIGMA=solve(A,b))
          [,1]
[1,] 0.2283151
[2,] 0.6283431
[3,] 1.1335501

using (conditional) likelihood estimators

Finally, we can assume some distribution for the innovation process. The standard model is a Gaussian model, i.e.

has a Gaussian distribution

In that case, the conditional log likelihood (conditional since we set the first two observations here) is

> CondLogLik=function(A,TS){
+ phi1=A[1];  phi2=A[2]
+ sigma=A[3]; L=0
+ for(t in 3:length(TS)){
+ L=L+dnorm(TS[t],mean=phi1*TS[t-1]+
+ phi2*TS[t-2],sd=sigma,log=TRUE)}
+ return(-L)}

Now, we can run standard optimization procedures,

> LogL=function(A) CondLogLik(A,TS=Z)
> optim(c(0,0,1),LogL)
$par
[1] 0.2339589 0.6285002 1.0565613

$value
[1] 293.3042

$counts
function gradient 
     106       NA 

$convergence
[1] 0

$message
NULL

It is also possible to consider a global maximum likelihood optimisation problem, since the variance matrix of vector has a know form.

using (unconditional) likelihood estimators

The variance matrix of is , where autocovariances are not not know, be can easily be computed using a recursive relationship.

> library(mnormt)
> GlobalLogLik=function(A,TS){
+ n=length(TS)
+ phi1=A[1];  phi2=A[2]
+ sigma=A[3]
+ SIG=matrix(0,n,n)
+ rho=rep(0,n)
+ rho[1]=1
+ rho[2]=phi1/(1-phi2)
+ for(h in 3:n) rho[h]=phi1*rho[h-1]+phi2*rho[h-2]
+ for(i in 1:n){for(j in 1:n){
+ SIG[i,j]=rho[abs(i-j)+1]}}
+ gamma0=(1-phi2)*sigma^2/((1+phi2)*((1-phi2)^2-phi1^2))
+ SIG=gamma0*SIG
+ return(dmnorm(TS,rep(0,n),SIG,log=TRUE))}
> LogL=function(A) -GlobalLogLik(A,TS=Z)
> optim(c(.1,.1,1),LogL)
Error in chol.default(x, pivot = FALSE) : 
Error in pd.solve(varcov, log.det = TRUE) : 
  x appears to be not positive definite

The problem is that there is a strong constraint on the pair to get a stationary process (we are not far away, here, from the border of the triangle, where the process become non stationary). To be more specific (this was mentioned in a previous post), we should have

i.e. in a standard matrix form

(we can add an additional constraint on the variance parameter, to insure that it will be positive). To run a contrained optimization routine, consider

> U=matrix(c(1,0,0,-1,0,1,0,-1,0,0,1,0),4,3)
> C=c(0,0,0,-.99999)
> constrOptim(c(.1,.1,1),LogL,grad=NULL,ui=U,ci=C)
$par
[1] 0.2238892 0.6342850 1.0613388

$value
[1] 297.9202

$counts
function gradient 
     108       NA 

$convergence
[1] 0

$message
NULL

$outer.iterations
[1] 2

$barrier.value
[1] 0.000189892

(here, to faster, we restrain the parameters so that they will be positive).

comparing those estimates

Here, our five estimators are rather close. Let us run more samples to see more precisely how they behave. For the first parameter , we get

and for the second one, , we have

The bias we observe is probably coming from the fact that, with this numerical example, we are not far away from the non-stationary case (the sum of the true parameters should be less than 1, and it is 0.95). When we estimate the parameters, we force them to be inside the triangle, since those parameters can be estimated only if the process is stationary.

Observe that the standard-deviation of the innovation process is here, well estimated,

(with clearly some estimators that perform better than others).

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics » R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.