Time series cross-validation 3

[This article was first published on Modern Toolmaking, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’ve updated my time-series cross validation algorithm to fix some bugs and allow for a possible xreg term.     This allows for cross-validation of multivariate models, so long as they are specified as a function with the following paramters: x (the series to model), xreg (independent variables, optional), newxreg (xregs for the forecast), and h (the number of periods to forecast).  Note that h should equal the number of rows in the xreg matrix.  Also note that you need to forecast the xreg object BEFORE forecasting your x object.  For example, if you wish to forecast 12 months into the future, your xreg object should have 12 extra rows.

Here is the source code for the new function:
#Function to cross-validate a time series.
cv.ts <- function(x, FUN, tsControl, xreg=NULL, ...) {
#Load required packages
stopifnot(is.ts(x))
stopifnot(is.data.frame(xreg) | is.matrix(xreg) | is.null(xreg))
stopifnot(require(forecast))
stopifnot(require(foreach))
stopifnot(require(plyr))
#Load parameters from the tsControl list
stepSize <- tsControl$stepSize
maxHorizon <- tsControl$maxHorizon
minObs <- tsControl$minObs
fixedWindow <- tsControl$fixedWindow
summaryFunc <- tsControl$summaryFunc
#Make sure xreg object is long enough for last set of forecasts
if (! is.null(xreg)) {
xreg <- as.matrix(xreg)
if (nrow(xreg)<length(x)+maxHorizon) {
warning('xreg object too short to forecast beyond the length of the time series.
Appending NA values to xreg')
nRows <- (length(x)+maxHorizon)-nrow(xreg)
nCols <- dim(xreg)[2]
addRows <- matrix(rep(NA,nCols*nRows),nrow=nRows, ncol=nCols)
colnames(addRows) <- colnames(xreg)
xreg <- rbind(xreg,addRows)
}
}
#Define additional parameters
freq <- frequency(x)
n <- length(x)
st <- tsp(x)[1]+(minObs-2)/freq
#Create a matrix of actual values.
#X is the point in time, Y is the forecast horizon
#http://stackoverflow.com/questions/8140577/creating-a-matrix-of-future-values-for-a-time-series
formatActuals <- function(x,maxHorizon) {
actuals <- outer(seq_along(x), seq_len(maxHorizon), FUN="+")
actuals <- apply(actuals,2,function(a) x[a])
actuals
}
actuals <- formatActuals(x,maxHorizon)
actuals <- actuals[minObs:(length(x)-1),,drop=FALSE]
#Create a list of training windows
#Each entry of this list will be the same length, if fixed=TRUE
#At each point in time, calculate 'maxHorizon' forecasts ahead
steps <- seq(1,(n-minObs),by=stepSize)
forcasts <- foreach(i=steps, .combine=rbind, .multicombine=FALSE) %dopar% {
if (is.null(xreg)) {
if (fixedWindow) {
xshort <- window(x, start=st+(i-minObs+1)/freq, end=st+i/freq)
} else {
xshort <- window(x, end=st + i/freq)
}
return(FUN(xshort, h=maxHorizon, ...))
} else if (! is.null(xreg)) {
if (fixedWindow) {
xshort <- window(x, start=st+(i-minObs+1)/freq, end=st+i/freq)
xregshort <- xreg[((i):(i+minObs-1)),,drop=FALSE]
} else {
xshort <- window(x, end=st + i/freq)
xregshort <- xreg[(1:(i+minObs-1)),,drop=FALSE]
}
newxreg <- xreg[(i+minObs):(i+minObs-1+maxHorizon),,drop=FALSE]
return(FUN(xshort, h=maxHorizon, xreg=xregshort, newxreg=newxreg, ...))
}
}
#Extract the actuals we actually want to use
actuals <- actuals[steps,,drop=FALSE]
#Accuracy at each horizon
out <- data.frame(
ldply(1:maxHorizon,
function(horizon) {
P <- forcasts[,horizon,drop=FALSE]
A <- na.omit(actuals[,horizon,drop=FALSE])
P <- P[1:length(A)]
P <- na.omit(P)
A <- A[1:length(P)]
summaryFunc(P,A)
}
)
)
#Add average accuracy, across all horizons
overall <- colMeans(out)
out <- rbind(out,overall)
#Add a column for which horizon and output
return(data.frame(horizon=c(1:maxHorizon,'All'),out))
}
view raw cv.ts.R hosted with ❤ by GitHub


And here is an example, using a linear model with xregs:
#Wrapper functions
tsSummary <- function(P,A) {
data.frame(t(accuracy(P,A)))
}
lmForecast <- function(x,h,xreg=NULL,newxreg=NULL,...) {
require(forecast)
x <- data.frame(x)
colnames(x) <- 'x'
if (is.null(xreg) & is.null(newxreg)) {
fit <- tslm(x ~ trend + season, data=x, ...)
return(forecast(fit, h=h, level=99)$mean)
} else if ((!is.null(xreg)) & !(is.null(newxreg))) {
newnames <- c('x',colnames(xreg))
x <- cbind(x,xreg)
colnames(x) <- newnames
fmla <- as.formula(paste("x ~ trend + season +", paste(colnames(xreg), collapse= "+")))
fit <- tslm(fmla, data=x)
return(forecast(fit, h=h, level=99, newdata=newxreg)$mean)
} else {
stop('xreg and newxreg must both be NULL or both be provided')
}
}
arimaForecast <- function(x,h,xreg=NULL,newxreg=NULL,...) {
fit <- Arima(x, xreg=xreg, ...)
forecast(fit, h=h, level=99, xreg=newxreg)$mean
}
#Create xregs
library(forecast)
X <- fourier(ldeaths,3)
#Cross-validate models
myControl <- list( minObs=12,
stepSize=1,
maxHorizon=12,
fixedWindow=FALSE,
summaryFunc=tsSummary
)
lmResult <- cv.ts(ldeaths, lmForecast, myControl, xreg=X)
arimaResult <- cv.ts(ldeaths, arimaForecast, myControl, order=c(1,0,1), method="ML", xreg=X)
#Examine results
lmResult
arimaResult
plot(lmResult[1:12,'MAE'], type='l')
lines(arimaResult[1:12,'MAE'], col=2)
view raw test.R hosted with ❤ by GitHub


I am particularly excited about this code because it will allow me to apply arbitrary machine learning algorithms to forecasting problems.  For example, I could create an xreg matrix of lags and use a support vector machine, neural network, or random forest to make 1-step forecasts.  I am planning to release this code as a package on CRAN, once I finish the documentation.  I’m also planning to re-work the function a bit to return an S3 class, containing:

  1. Predicted values at each forecast horizon, including beyond the length of the input time series.
  2. Actual values at each forecast horizon, for easy comparison to #1.
  3. Matrix of average error at each horizon.
  4. The final model.
  5. Forecasts using the final model, from the last observation of x to the “max horizon”.
  6. A print method that will show #3.
  7. A plot method that will plot #3.
Let me know if you have any suggestions or spot any bugs!


To leave a comment for the author, please follow the link and comment on their blog: Modern Toolmaking.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)