Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This is the story of a subtle error that, to my opinion, is a nice example of the special challenges of statistical programming. One of my main research interests is time series with long memory. These are often modeled by fractionally integrated models, where
Here
To generate a fractionally integrated series, we can bring the fractional differencing filter to the other side:
An important special case of this is the random walk model where
Using the same stream of random number
rm(list=ls()) set.seed(54321) library(fracdiff) series<-rnorm(1000) a<-diffseries(series,d=-1) b<-cumsum(series) ts.plot(a, ylim=c(min(a,b),max(a,b))) lines(b, col=2) legend(x="bottomleft", col=c(1,2), lty=c(1,1), bty="n", legend=c("RW generated with cumsum()", "RW generated with diffseries()"))
As one can see from the graph above, the two series a and b that should be identical diverge faster from each other the longer the series becomes.
Recently I was reminded of this curious behavior when I was trying to implement some new statistical procedures for a research paper and they refused to work until I replaced diffseries() with the function fast_fracdiff(), that was proposed as a faster alternative for fractional differencing by Jensen and Nielsen (2014) and makes use of the convolution theorem.
fast_fracdiff <- function(x, d){ iT <- length(x) np2 <- nextn(2*iT - 1, 2) k <- 1:(iT-1) b <- c(1, cumprod((k - d - 1)/k)) dx <- fft(fft(c(b, rep(0, np2 - iT))) * fft(c(x, rep(0, np2 - iT))), inverse = T) / np2; return(Re(dx[1:iT])) } c<-fast_fracdiff(series, d=-1) ts.plot(a, ylim=c(min(a,b),max(a,b))) lines(b, col=2) lines(c, col=3) legend(x="bottomleft", col=c(1,2,3), lty=c(1,1,1), bty="n", legend=c("RW generated with cumsum()", "RW generated with diffseries()", "RW generated with fast_fracdiff()"))
As one can see, using fast_fracdiff() which can be found on the university webpage of Morten Nielsen with
diffseries <- function(x, d) { x <- as.data.frame(x) names(x) <- "series" x 1) stop("only implemented for univariate time series") if (any(is.na(x))) stop("NAs in x") n = 2) x <- x - mean(x) PI <- numeric(n) PI[1] <- -d for (k in 2:n) { PI[k] <- PI[k-1]*(k - 1 - d)/k } ydiff <- x for (i in 2:n) { ydiff[i] <- x[i] + sum(PI[1:(i-1)]*x[(i-1):1]) } ## return numeric! ydiff }
As one can see above, in the line x<-x-mean(x) the inpust series is de-meaned prior to the fractional differentiation. This is because in a model with non-zero means, we have
where
x<-fracdiff.sim(n=1000, d=0.4)$series+10 mean(x) ## [1] 9.702761 y<-diffseries(x,0.4) z<-fast_fracdiff(x,0.4) mean(y) ## [1] -0.004302338 mean(z) ## [1] 0.6807775
But what causes the behavior in the first graph above? Why do the series drift apart? Here the input series is the series of innovations
If we use diffseries() to integrate the white noise sequence, these are demeaned before they are integrated.
As you can see from the last equation, this produces a random walk with drift, where the drift parameter is given by the mean of the of the innovations.
What do we lean from this? The de-meaning in diffseries() will probably not cause problems in most use cases. However, I certainly think it is problematic since the expected value
But there is another point to make here. It is obvious that diffseries() is written with the differentiation in mind. Using it to integrate – even though it should be theoretically possible – goes beyond the use cases that the developers had in mind. Errors caused by using the function this way are hard to spot, since the generated series is still a random walk and the slight drift is hard to spot unless it is compared with the cumsum() function. They could go unnoticed for years. This highlights very nicely the extra degree of care that is necessary for statistical programming.
Refrences
Hosking, J. R. (1981). Fractional differencing. Biometrika, 165-176.
Jensen, A. N., & Nielsen, M. Ø. (2014). A fast fractional difference algorithm. Journal of Time Series Analysis, 35(5), 428-436.
Filed under: R
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.