Time series cross-validation 3
[This article was first published on Modern Toolmaking, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’ve updated my time-series cross validation algorithm to fix some bugs and allow for a possible xreg term. This allows for cross-validation of multivariate models, so long as they are specified as a function with the following paramters: x (the series to model), xreg (independent variables, optional), newxreg (xregs for the forecast), and h (the number of periods to forecast). Note that h should equal the number of rows in the xreg matrix. Also note that you need to forecast the xreg object BEFORE forecasting your x object. For example, if you wish to forecast 12 months into the future, your xreg object should have 12 extra rows.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Here is the source code for the new function:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Function to cross-validate a time series. | |
cv.ts <- function(x, FUN, tsControl, xreg=NULL, ...) { | |
#Load required packages | |
stopifnot(is.ts(x)) | |
stopifnot(is.data.frame(xreg) | is.matrix(xreg) | is.null(xreg)) | |
stopifnot(require(forecast)) | |
stopifnot(require(foreach)) | |
stopifnot(require(plyr)) | |
#Load parameters from the tsControl list | |
stepSize <- tsControl$stepSize | |
maxHorizon <- tsControl$maxHorizon | |
minObs <- tsControl$minObs | |
fixedWindow <- tsControl$fixedWindow | |
summaryFunc <- tsControl$summaryFunc | |
#Make sure xreg object is long enough for last set of forecasts | |
if (! is.null(xreg)) { | |
xreg <- as.matrix(xreg) | |
if (nrow(xreg)<length(x)+maxHorizon) { | |
warning('xreg object too short to forecast beyond the length of the time series. | |
Appending NA values to xreg') | |
nRows <- (length(x)+maxHorizon)-nrow(xreg) | |
nCols <- dim(xreg)[2] | |
addRows <- matrix(rep(NA,nCols*nRows),nrow=nRows, ncol=nCols) | |
colnames(addRows) <- colnames(xreg) | |
xreg <- rbind(xreg,addRows) | |
} | |
} | |
#Define additional parameters | |
freq <- frequency(x) | |
n <- length(x) | |
st <- tsp(x)[1]+(minObs-2)/freq | |
#Create a matrix of actual values. | |
#X is the point in time, Y is the forecast horizon | |
#http://stackoverflow.com/questions/8140577/creating-a-matrix-of-future-values-for-a-time-series | |
formatActuals <- function(x,maxHorizon) { | |
actuals <- outer(seq_along(x), seq_len(maxHorizon), FUN="+") | |
actuals <- apply(actuals,2,function(a) x[a]) | |
actuals | |
} | |
actuals <- formatActuals(x,maxHorizon) | |
actuals <- actuals[minObs:(length(x)-1),,drop=FALSE] | |
#Create a list of training windows | |
#Each entry of this list will be the same length, if fixed=TRUE | |
#At each point in time, calculate 'maxHorizon' forecasts ahead | |
steps <- seq(1,(n-minObs),by=stepSize) | |
forcasts <- foreach(i=steps, .combine=rbind, .multicombine=FALSE) %dopar% { | |
if (is.null(xreg)) { | |
if (fixedWindow) { | |
xshort <- window(x, start=st+(i-minObs+1)/freq, end=st+i/freq) | |
} else { | |
xshort <- window(x, end=st + i/freq) | |
} | |
return(FUN(xshort, h=maxHorizon, ...)) | |
} else if (! is.null(xreg)) { | |
if (fixedWindow) { | |
xshort <- window(x, start=st+(i-minObs+1)/freq, end=st+i/freq) | |
xregshort <- xreg[((i):(i+minObs-1)),,drop=FALSE] | |
} else { | |
xshort <- window(x, end=st + i/freq) | |
xregshort <- xreg[(1:(i+minObs-1)),,drop=FALSE] | |
} | |
newxreg <- xreg[(i+minObs):(i+minObs-1+maxHorizon),,drop=FALSE] | |
return(FUN(xshort, h=maxHorizon, xreg=xregshort, newxreg=newxreg, ...)) | |
} | |
} | |
#Extract the actuals we actually want to use | |
actuals <- actuals[steps,,drop=FALSE] | |
#Accuracy at each horizon | |
out <- data.frame( | |
ldply(1:maxHorizon, | |
function(horizon) { | |
P <- forcasts[,horizon,drop=FALSE] | |
A <- na.omit(actuals[,horizon,drop=FALSE]) | |
P <- P[1:length(A)] | |
P <- na.omit(P) | |
A <- A[1:length(P)] | |
summaryFunc(P,A) | |
} | |
) | |
) | |
#Add average accuracy, across all horizons | |
overall <- colMeans(out) | |
out <- rbind(out,overall) | |
#Add a column for which horizon and output | |
return(data.frame(horizon=c(1:maxHorizon,'All'),out)) | |
} |
And here is an example, using a linear model with xregs:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Wrapper functions | |
tsSummary <- function(P,A) { | |
data.frame(t(accuracy(P,A))) | |
} | |
lmForecast <- function(x,h,xreg=NULL,newxreg=NULL,...) { | |
require(forecast) | |
x <- data.frame(x) | |
colnames(x) <- 'x' | |
if (is.null(xreg) & is.null(newxreg)) { | |
fit <- tslm(x ~ trend + season, data=x, ...) | |
return(forecast(fit, h=h, level=99)$mean) | |
} else if ((!is.null(xreg)) & !(is.null(newxreg))) { | |
newnames <- c('x',colnames(xreg)) | |
x <- cbind(x,xreg) | |
colnames(x) <- newnames | |
fmla <- as.formula(paste("x ~ trend + season +", paste(colnames(xreg), collapse= "+"))) | |
fit <- tslm(fmla, data=x) | |
return(forecast(fit, h=h, level=99, newdata=newxreg)$mean) | |
} else { | |
stop('xreg and newxreg must both be NULL or both be provided') | |
} | |
} | |
arimaForecast <- function(x,h,xreg=NULL,newxreg=NULL,...) { | |
fit <- Arima(x, xreg=xreg, ...) | |
forecast(fit, h=h, level=99, xreg=newxreg)$mean | |
} | |
#Create xregs | |
library(forecast) | |
X <- fourier(ldeaths,3) | |
#Cross-validate models | |
myControl <- list( minObs=12, | |
stepSize=1, | |
maxHorizon=12, | |
fixedWindow=FALSE, | |
summaryFunc=tsSummary | |
) | |
lmResult <- cv.ts(ldeaths, lmForecast, myControl, xreg=X) | |
arimaResult <- cv.ts(ldeaths, arimaForecast, myControl, order=c(1,0,1), method="ML", xreg=X) | |
#Examine results | |
lmResult | |
arimaResult | |
plot(lmResult[1:12,'MAE'], type='l') | |
lines(arimaResult[1:12,'MAE'], col=2) |
I am particularly excited about this code because it will allow me to apply arbitrary machine learning algorithms to forecasting problems. For example, I could create an xreg matrix of lags and use a support vector machine, neural network, or random forest to make 1-step forecasts. I am planning to release this code as a package on CRAN, once I finish the documentation. I’m also planning to re-work the function a bit to return an S3 class, containing:
- Predicted values at each forecast horizon, including beyond the length of the input time series.
- Actual values at each forecast horizon, for easy comparison to #1.
- Matrix of average error at each horizon.
- The final model.
- Forecasts using the final model, from the last observation of x to the “max horizon”.
- A print method that will show #3.
- A plot method that will plot #3.
Let me know if you have any suggestions or spot any bugs!
To leave a comment for the author, please follow the link and comment on their blog: Modern Toolmaking.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.