Story of the Ljung-Box Blues: Progress Not Perfection
[This article was first published on The Dancing Economist, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In the last post we determined that our ARIMA(2,2,2) model failed to pass the Ljung-Box test. In todays post we seek to completely discredit the last posts claim and finally arrive at some needed closure. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The Ljung-Box is first performed on the series at hand, because it means that at least one of the autocorrelation functions is non zero. What does that mean? Well, it means that we can forecast because the values in the series can be used to predict each other. It helps us numerically come to the conclusion that the series itself is not a white noise process and so its movements are not completely random.
When we perform the Ljung-Box in R on GDP we get the following results:
> Box.test(GDP,lag=20,type=”Ljung-Box”)
Box-Ljung test
data: GDP
X-squared = 4086.741, df = 20, p-value < 2.2e-16
What this output is telling us is to reject the null hypothesis that all of the autocorrelation functions out to 20 are zero. At least one of these is non zero. This gives us the green light to use AR, MA or ARMA in our approach towards modeling and forecasting.
The second time the Ljung-Box shows up is when we want to test to see if the error terms or residuals are white noise. A good forecasting model will have to have zero correlation between its residuals or else you could forecast them. It naturally follows that if you can forecast the error terms then a better model must exist.
Here is the Ljung-Box Q test out to the 26th Lag:
> LjungBoxTest(res,k=2,StartLag=1)
m Qm p-value:
1 0.05 0.82118640
2 0.05 0.81838128
3 0.72 0.39541957
4 0.75 0.68684256
5 2.00 0.57224678
6 2.41 0.66164894
7 3.24 0.66255593
8 9.05 0.17070965
9 15.14 0.03429650
10 15.54 0.04946816
11 15.64 0.07487629
12 22.14 0.01442010
13 22.51 0.02073827
14 22.72 0.03020402
15 23.24 0.03889525
16 23.24 0.05648292
17 23.29 0.07809501
18 26.81 0.04367819
19 30.20 0.02494375
20 30.20 0.03554725
21 31.56 0.03500150
22 32.46 0.03868275
23 32.47 0.05241222
24 34.14 0.04748629
25 35.47 0.04672181
26 36.28 0.05151986
As you can see with your very special eyes we fail to reject the null hypothesis out to the 8th lag. So we have no evidence of residual autocorrelation and hence we have no evidence to contradict the assumption that the errors are white noise. Our model checks out people!
Now if you want to plot the Ljung-Box just type in the following:
> x<-LjungBoxTest(res,k=2,StartLag=1)
> plot(x[,3],main=”Ljung-Box Q Test”,ylab=”P-values”,xlab=”Lag”)
The white noise process should also have a normal distribution with a mean of 0. To do a rough test of normality we can run a simple Q-Q plot in R. The values are normal if they rest on a line and aren’t all over the place.
The following command gives us this plot:
qqnorm(res)
qqline(res)
The following command gives us this plot:
qqnorm(res)
qqline(res)
The Q-Q plot seems to suggest normality- however there are some formal tests we can run in R to verify this assumption. Two formal tests are the Jarque-Bera Test and the Shapiro-Wilk normality test. Both have a null hypothesis that the series follows a normal distribution and therefore a rejection of the null suggests that the series does not follow a normal distribution.
> jarque.bera.test(res)
Jarque Bera Test
data: res
X-squared = 9660.355, df = 2, p-value < 2.2e-16
> shapiro.test(res)
Shapiro-Wilk normality test
data: res
W = 0.7513, p-value < 2.2e-16
Wow! Both of these test strongly reject the possibility of the white noise process having a normal distribution.
We can still see if the mean of the residuals is zero by simply typing the following into R:
> mean(model$res)
[1] 3.754682
The mean is clearly not zero which implies we have some sort of a problem. In fact, it means that the Ljung-Box was not the proper test because it requires:
A. The time series be stationary
B. The white noise process has a normal distribution with mean zero.
Given that we just determined that the mean is definitely not zero and that both of our formal tests rejected the possibility of our white noise process following a normal distribution, we do indeed face a serious problem. This is a evolving and growing period for us forecasting in R novices. I don’t have all the answers (clearly), but strides are made in the right direction every day. The greatest thing about making mistakes and tripping in the forest is getting back up and getting the hell out of there.
A. The time series be stationary
B. The white noise process has a normal distribution with mean zero.
Given that we just determined that the mean is definitely not zero and that both of our formal tests rejected the possibility of our white noise process following a normal distribution, we do indeed face a serious problem. This is a evolving and growing period for us forecasting in R novices. I don’t have all the answers (clearly), but strides are made in the right direction every day. The greatest thing about making mistakes and tripping in the forest is getting back up and getting the hell out of there.
Please keep posted and keep dancin’,
Steven J.
To leave a comment for the author, please follow the link and comment on their blog: The Dancing Economist.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.