Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
It is common to use a Ljung-Box test to check that the residuals from a time series model resemble white noise. However, there is very little practical advice around about how to choose the number of lags for the test.
The Ljung-Box test was proposed by Ljung and Box (Biometrika, 1978)
and is based on the statistic
where
In my forecasting textbook with George Athanasopoulos, we recommended using
I have not seen similar advice in other time series textbooks; most authors seem to ignore such practical considerations. Patrick Burns in a short online paper suggests the rule
I think the latter test is unnecessarily restrictive. In practice, we don’t really need the
I have done some simulations to see the effect of different values of
Clearly, the p-values fail the uniformity test for all values of
However, when we look at the actual size of the test, a different story emerges.
The above figure shows the proportion of Ljung-Box p-values less than 0.05 (top line) and less than 0.01 (bottom line). This demonstrates that the actual size of the test tends to be larger than the nominal size except for small values of
This analysis suggests that my rule-of-thumb could be modified a little as follows:
- For non-seasonal time series, use
. - For seasonal time series, use
.
It also shows that you should not be too fussy about the boundary. Because the actual size is a little larger than the nominal size for these values of
R code
In case any reader wants to play with some more simulations, here is my R code.
lj <- array(NA, c(4,10000,100)) size <- c(20,50,100,1000) for(i in 1:10000) { for(k in 1:4) { x <- rnorm(size[k]) for(j in 1:min(100,size[k])) lj[k,i,j] <- Box.test(x, type="Ljung-Box", lag=j)$p.value } } ks.pvalue <- function(x) { x <- na.omit(x) if(length(x)>1) return(ks.test(x, punif)$p.value) else return(NA) } testsize <- apply(lj, c(1,3), FUN=ks.pvalue) par(mfrow=c(2,2)) plot(1:20,testsize[1,1:20],type="l", xlab="Number of lags", main="T=20", ylab="KS p-value", ylim=c(0,1)) abline(h=0.05,col="gray") plot(1:50,testsize[2,1:50], type="l", xlab="Number of lags", main="T=50", ylab="KS p-value", ylim=c(0,1)) abline(h=0.05,col="gray") plot(1:100,testsize[3,1:100], type="l", xlab="Number of lags", main="T=100", ylab="KS p-value", ylim=c(0,1)) abline(h=0.05,col="gray") plot(1:100,testsize[4,1:100], type="l", xlab="Number of lags", main="T=1000", ylab="KS p-value", ylim=c(0,1)) abline(h=0.05,col="gray") pv<- function(x, threshold=0.05) { mean(x<threshold, na.rm=TRUE) } testsize <- apply(lj, c(1,3), FUN=pv) testsize1 <- apply(lj, c(1,3), FUN=pv, threshold=0.01) par(mfrow=c(2,2)) plot(1:20,testsize[1,1:20],type="n", main="T=20", xlab="Number of lags", ylab="Actual size", ylim=c(0,0.12)) axis(2, at=seq(0,0.1,by=0.01)) abline(h=c(0.01,0.05),col=gray(0.6)) lines(1:20, testsize[1,1:20]) lines(1:20, testsize1[1,1:20]) plot(1:50,testsize[2,1:50], type="n", main="T=50", xlab="Number of lags", ylab="Actual size", ylim=c(0,0.12)) axis(2, at=seq(0,0.1,by=0.01)) abline(h=c(0.01,0.05),col=gray(0.6)) lines(1:50, testsize[2,1:50]) lines(1:50, testsize1[2,1:50]) plot(1:100,testsize[3,1:100], type="n", main="T=100", xlab="Number of lags", ylab="Actual size", ylim=c(0,0.12)) axis(2, at=seq(0,0.1,by=0.01)) abline(h=c(0.01,0.05),col=gray(0.6)) lines(1:100, testsize[3,1:100]) lines(1:100, testsize1[3,1:100]) plot(1:100,testsize[4,1:100], type="n", main="T=1000", xlab="Number of lags", ylab="Actual size", ylim=c(0,0.12)) axis(2, at=seq(0,0.1,by=0.01)) abline(h=c(0.01,0.05),col=gray(0.6)) lines(1:100, testsize[4,1:100]) lines(1:100, testsize1[4,1:100]) |
Reference:
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.