Site icon R-bloggers

Forecasting In R: A New Hope with AR(10)

[This article was first published on The Dancing Economist, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In our last post we determined that the ARIMA(2,2,2) model was just plain not going to work for us.  Although i didn’t show its residuals failed to pass the acf and pacf test for white noise and the mean of its residuals was greater than three when it should have been much closer to zero.
Today we discover that an AR(10) of the de-trended GDP series may be the best option at hand.  Normally when we do model selection we start with the one that has the lowest AIC and then proceed to test the error terms (or residuals) for white noise. Lets take a look at the model specs for the AR(10):


> model7<-arima(dt,order=c(10,0,0))

> model7

Call:
arima(x = dt, order = c(10, 0, 0))

Coefficients:
         ar1      ar2            ar3       ar4           ar5     ar6           ar7      ar8        ar9     ar10
      1.5220  -0.4049  -0.2636  0.2360  -0.2132  0.1227  -0.0439  -0.0958  0.3244  -0.2255
s.e.  0.0604   0.1105   0.1131  0.1143   0.1154  0.1147   0.1139   0.1127  0.1111   0.0627
      intercept
       -21.6308
s.e.    57.5709

sigma^2 estimated as 1452:  log likelihood = -1307.76,  aic = 2639.52

The Ljung-Box Q test checks out to the 20th lag: 

> Box.test(model7$res,lag=20,type=”Ljung-Box”)

Box-Ljung test

data:  model7$res 
X-squared = 15.0909, df = 20, p-value = 0.7712

It even checks out to the 30th lag! I changed the way Iplotted the Ljung-Box Q values because after finding this little function called “LBQPlot” which makes life way easier.

> LBQPlot(model7$res,lag.max=30,StartLag=1)
Most importantly both the ACF and the PACF of the residuals check out for the white noise process.  In the ARIMA(2,2,2) model these weren’t even close to what we wanted them to be.

> par(mfrow=c(2,1))
> acf(model7$res)
> pacf(model7$res)



Unfortunately, our residuals continue to fail the formal tests for normality. I don’t really know what to do about this or even what the proper explanation is, but I have a feeling that these tests are very very sensitive. 

> jarque.bera.test(model7$res)

Jarque Bera Test

data:  model7$res 
X-squared = 7507.325, df = 2, p-value < 2.2e-16

> shapiro.test(model7$res)

Shapiro-Wilk normality test

data:  model7$res 
W = 0.7873, p-value < 2.2e-16

The mean is also considerably closer to 0 but just not quite there.

> mean(model7$res)
[1] 0.7901055

Take a look at the plot for normality:

> qqnorm(model7$res)
> qqline(model7$res)
In the next post we are going to test how good our model actually is.  Today we found our optimal choice in terms of model specs, but we also should see how well it can forecast past values of GDP.  In addition to evaluating our models past accuracy we will also practice forecasting into the future.  As you continue to read these posts, you should be getting significantly better with R- I know I am!  We have covered many new codes that stem from many different libraries. I would like to keep on doing analysis in R so after we finish forecasting GDP I think I may move on to some Econometrics. Please keep forecasting and most certainly keep dancin’,

Steven J.

To leave a comment for the author, please follow the link and comment on their blog: The Dancing Economist.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.