Inference for ARMA(p,q) Time Series
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
As we mentioned in our previous post, as soon as we have a moving average part, inference becomes more complicated. Again, to illustrate, we do not need a two general model. Consider, here, some process,
where is some white noise, and assume further that .
> theta=.7 > phi=.5 > n=1000 > Z=rep(0,n) > set.seed(1) > e=rnorm(n) > for(t in 2:n) Z[t]=phi*Z[t-1]+e[t]+theta*e[t-1] > Z=Z[800:1000] > plot(Z,type="l")
- A two step procedure
To start with something simple, assume that we did miss the moving average component,
The estimator of – by least squares – is not longer consistent. But still. We can still compute it
> base=data.frame(Y=Z[2:n],X=Z[1:(n-1)]) > regression=lm(Y~0+X,data=base) > summary(regression) Call: lm(formula = Y ~ 0 + X, data = base) Residuals: Min 1Q Median 3Q Max -3.2445 -0.7909 0.0626 0.9707 3.0685 Coefficients: Estimate Std. Error t value Pr(>|t|) X 0.69571 0.05101 13.64 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.225 on 199 degrees of freedom (799 observations deleted due to missingness) Multiple R-squared: 0.4832, Adjusted R-squared: 0.4806 F-statistic: 186 on 1 and 199 DF, p-value: < 2.2e-16
and then, we cancompute the autocorrelation of the noise,
> n=200 > cor(residuals(regression)[2:n],residuals(regression)[1:(n-1)]) [1] 0.2663076
or more formally, use Durbin-Watson estimator, to get autocorrelation of the noise (and some significance test)
> library(car) > durbinWatsonTest(regression) lag Autocorrelation D-W Statistic p-value 1 0.2656555 1.46323 0 Alternative hypothesis: rho != 0
The point, here, is that we would like to assume that
meaning that should be some process. And
i.e. is a root of this quadratic problem,
> polyroot(c(1,-1/cor(residuals(regression)[2:n],residuals(regression)[1:(n-1)]),1)) [1] 0.2884681+0i 3.4665883+0i
Here, we do have two positive roots. I would go for the one smaller than one, in order to be able to invert the polynomial, if necessary…
- Use of the empirical autocorrelation function
An alternative might be to use properties of the autocorrelation function,
and
Again, we have a set of three equations, with three unknown parameters. Numerically, it is possible to find some roots. If we run the code, we get
> v=c(as.numeric(acf(Z)$acf[2:3]),1)*var(Z) > library(rootSolve) > seteq=function(x){ + F1=v[1]-x[3]^2*(x[2]^2+2*x[1]*x[2]+1)/(1-x[1]^2) + F2=v[2]-(x[1]*v[1]+x[2]*x[3]^2) + F3=v[3]-x[1]*v[2] + return(c(F1,F2,F3))} > multiroot(f=seteq,start=c(.1,.1,1)) $root [1] 3.643734 -3.188145 1.427759 $f.root [1] 1.371170e-11 -3.714573e-11 0.000000e+00 $iter [1] 8 $estim.precis [1] 1.695248e-11
Here, we have a situation…
- Use of least square techniques
We can use, here, the algorithm described in the context of processes.
> V=function(p){ + phi=p[1] + theta=p[2] + u=rep(0,length(Z)) + for(t in 2:length(Z)) u[t]=Z[t]-phi*Z[t-1]-theta*u[t-1] + return(sum(u^2)) + } > p=optim(par=c(.1,.1),V)$par [1] 0.3637783 0.7773845 > coef=c(p,sqrt(V(p)/(length(Z))))
which is not so bad. Actually, if we run that procecure on 1,000 samples, we get the following output
- Use of maximum likelihood techniques
Last, but not least, one more time, we can use (global) maximum likelihood techniques, since the process is a Gaussian process (all finite dimensional vector will have a joint Gaussian distribution) if we assume that the noise is Gaussian.
> library(mnormt) > GlobalLogLik=function(A,TS){ + n=length(TS) + phi=A[1]; theta=A[2] + sigma=A[3] + SIG=matrix(0,n,n) + rho=rep(0,n) + rho[1]=sigma^2*(theta^2+2*phi*theta+1)/(1-phi^2) + rho[2]=phi*rho[1]+theta*sigma^2 + for(h in 3:n) rho[h]=phi*rho[h-1] + for(i in 1:n){for(j in 1:n){ + SIG[i,j]=rho[abs(i-j)+1]}} + return(dmnorm(TS,rep(0,n),SIG,log=TRUE))} > LogL=function(A) -GlobalLogLik(A,TS=Z) > optim(c(.1,.1,1),LogL)$par [1] 0.3890991 0.7672036 1.0731340
It works fine, one more time. But maybe we got lucky here. We’ve seen in the post on autoregressive time series that the algorithm might fell if the time series is not stationary. In order to avoid such problems, we can consider a constraint optimization problem, where we simply recall that ,
> U=matrix(c(1,-1,0,0,0,0),2,3) > C=-c(.999,.999) > constrOptim(c(.1,.1,1),LogL,grad=NULL,ui=U,ci=C) $par [1] 0.3890991 0.7672036 1.0731340 $value [1] 300.1956 $counts function gradient 118 NA $convergence [1] 0 $message NULL $outer.iterations [1] 2 $barrier.value [1] -1.536358e-05
If we run that algorithm 1,000 times, on simulated time series (with the same parameters), we get
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.