Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Consider a (stationary) autoregressive process, say of order 2,
for some white noise
> phi1=.25 > phi2=.7 > n=1000 > set.seed(1) > e=rnorm(n) > Z=rep(0,n) > for(t in 3:n) Z[t]=phi1*Z[t-1]+phi2*Z[t-2]+e[t] > Z=Z[800:1000] > n=length(Z) > plot(Z,type="l")
Here, we have to estimate two sets of parameters: the autoregressive coefficients
- using least square regression
A natural idea is to see here a regression model, since (if we consider a matrix formulation)
Here we can run (conditional) ordinary least squares estimation,
> base=data.frame(Y=Z[3:n],X1=Z[2:(n-1)],X2=Z[1:(n-2)]) > regression=lm(Y~0+X1+X2,data=base) > summary(regression) Call: lm(formula = Y ~ 0 + X1 + X2, data = base) Residuals: Min 1Q Median 3Q Max -3.0268 -0.7063 0.1065 0.6925 3.2566 Coefficients: Estimate Std. Error t value Pr(>|t|) X1 0.23400 0.05463 4.283 2.88e-05 *** X2 0.62863 0.05476 11.479 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.062 on 197 degrees of freedom Multiple R-squared: 0.6349, Adjusted R-squared: 0.6312 F-statistic: 171.3 on 2 and 197 DF, p-value: < 2.2e-16
so we get the following estimators, for the autocorrelation coefficients, and the volatility of the noise
> regression$coefficients X1 X2 0.2339959 0.6286321 > summary(regression)$sigma [1] 1.061839
- using Yule-Walker equations
As we’ve seen in class, we can easily get the following equations for the autocovariance functions,
which can also be written (again, using a matrix expression)
So we just have to solve a simple linear system of equations. Note that if we divide by the variance
The code is the following
> rho1=cor(Z[1:(n-1)],Z[2:n]) > rho2=cor(Z[1:(n-2)],Z[3:n]) > A=matrix(c(1,rho1,rho1,1),2,2) > b=matrix(c(rho1,rho2),2,1) > (PHI=solve(A,b)) [,1] [1,] 0.2256270 [2,] 0.6315329
Now, we need to extract the estimated innovation process, from this set of parameters
> estWN=base$Y-(PHI[1]*base$X1+PHI[2]*base$X2) > sd(estWN) [1] 1.058558
This estimator is probably not the best one (we can take into account that we’ve lost two degrees of freedom), but as a starting point, let us consider this one.
An alternative could be to include the variance term in Yule-Walker equations, to get a three dimensional linear equation,
It is not much more complicated to solve, actually,
> gamma0=var(Z[1:n]) > gamma1=var(Z[1:(n-1)],Z[2:n]) > gamma2=var(Z[1:(n-2)],Z[3:n]) > A=matrix(c(gamma1,gamma0,gamma1,gamma2,gamma1,gamma0,1,0,0),3,3) > b=matrix(c(gamma0,gamma1,gamma2),3,1) > (PHISIGMA=solve(A,b)) [,1] [1,] 0.2283151 [2,] 0.6283431 [3,] 1.1335501
- using (conditional) likelihood estimators
Finally, we can assume some distribution for the innovation process. The standard model is a Gaussian model, i.e.
has a Gaussian distribution
In that case, the conditional log likelihood (conditional since we set the first two observations here) is
> CondLogLik=function(A,TS){ + phi1=A[1]; phi2=A[2] + sigma=A[3]; L=0 + for(t in 3:length(TS)){ + L=L+dnorm(TS[t],mean=phi1*TS[t-1]+ + phi2*TS[t-2],sd=sigma,log=TRUE)} + return(-L)}
Now, we can run standard optimization procedures,
> LogL=function(A) CondLogLik(A,TS=Z) > optim(c(0,0,1),LogL) $par [1] 0.2339589 0.6285002 1.0565613 $value [1] 293.3042 $counts function gradient 106 NA $convergence [1] 0 $message NULL
It is also possible to consider a global maximum likelihood optimisation problem, since the variance matrix of vector
- using (unconditional) likelihood estimators
The variance matrix of
> library(mnormt) > GlobalLogLik=function(A,TS){ + n=length(TS) + phi1=A[1]; phi2=A[2] + sigma=A[3] + SIG=matrix(0,n,n) + rho=rep(0,n) + rho[1]=1 + rho[2]=phi1/(1-phi2) + for(h in 3:n) rho[h]=phi1*rho[h-1]+phi2*rho[h-2] + for(i in 1:n){for(j in 1:n){ + SIG[i,j]=rho[abs(i-j)+1]}} + gamma0=(1-phi2)*sigma^2/((1+phi2)*((1-phi2)^2-phi1^2)) + SIG=gamma0*SIG + return(dmnorm(TS,rep(0,n),SIG,log=TRUE))} > LogL=function(A) -GlobalLogLik(A,TS=Z) > optim(c(.1,.1,1),LogL) Error in chol.default(x, pivot = FALSE) : Error in pd.solve(varcov, log.det = TRUE) : x appears to be not positive definite
The problem is that there is a strong constraint on the pair
i.e. in a standard matrix form
(we can add an additional constraint on the variance parameter, to insure that it will be positive). To run a contrained optimization routine, consider
> U=matrix(c(1,0,0,-1,0,1,0,-1,0,0,1,0),4,3) > C=c(0,0,0,-.99999) > constrOptim(c(.1,.1,1),LogL,grad=NULL,ui=U,ci=C) $par [1] 0.2238892 0.6342850 1.0613388 $value [1] 297.9202 $counts function gradient 108 NA $convergence [1] 0 $message NULL $outer.iterations [1] 2 $barrier.value [1] 0.000189892
(here, to faster, we restrain the parameters so that they will be positive).
- comparing those estimates
Here, our five estimators are rather close. Let us run more samples to see more precisely how they behave. For the first parameter
and for the second one,
The bias we observe is probably coming from the fact that, with this numerical example, we are not far away from the non-stationary case (the sum of the true parameters should be less than 1, and it is 0.95). When we estimate the parameters, we force them to be inside the triangle, since those parameters can be estimated only if the process is stationary.
Observe that the standard-deviation of the innovation process
(with clearly some estimators that perform better than others).
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.