Batch Forecasting in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Given a data frame with multiple columns which contain time series data, let’s say that we are interested in executing an automatic forecasting algorithm on a number of columns. Furthermore, we want to train the model on a particular number of observations and assess how well they forecast future values. Based upon those testing procedures, we will estimate the full model. This is a fairly simple undertaking, but let’s walk through this task. My preference for such procedures is to loop through each column and append the results into a nested list.
First, let’s create some data.
ddat <- data.frame(date = c(seq(as.Date("2010/01/01"), as.Date("2010/03/02"), by=1)), value1 = abs(round(rnorm(61), 2)), value2 = abs(round(rnorm(61), 2)), value3 = abs(round(rnorm(61), 2))) head(ddat) tail(ddat)
We want to forecast future values of the three columns. Because we want to save the results of these models into a list, lets begin by creating a list that contains the same number of elements as our data frame.
lst.names <- c(colnames(data)) lst <- vector("list", length(lst.names)) names(lst) <- lst.names lst
I’ve gone ahead and written a user defined function that handles the batch forecasting process. It takes two arguments, a data frame and default argument which specifies the number of observations that will be used in the training set. The model estimates, forecasts, and diagnostic measures will be saved as a nested list and categorized under the appropriate variable name.
batch <- function(data, n_train=55){ lst.names <- c(colnames(data)) lst <- vector("list", length(lst.names)) names(lst) <- lst.names for( i in 2:ncol(data) ){ lst[[1]][["train_dates"]] <- data[1:(n_train),1] lst[[1]][["test_dates"]] <- data[(n_train+1):nrow(data),1] est <- auto.arima(data[1:n_train,i]) fcas <- forecast(est, h=6)$mean acc <- accuracy(fcas, data[(n_train+1):nrow(data),i]) fcas_upd <- data.frame(date=data[(n_train+1):nrow(data),1], forecast=fcas, actual=data[(n_train+1):nrow(data),i]) lst[[i]][["estimates"]] <- est lst[[i]][["forecast"]] <- fcas lst[[i]][["forecast_f"]] <- fcas_upd lst[[i]][["accuracy"]] <- acc cond1 = diff(range(fcas[1], fcas[length(fcas)])) == 0 cond2 = acc[,3] >= 0.025 if(cond1|cond2){ mfcas = forecast(ma(data[,i], order=3), h=5) lst[[i]][["moving_average"]] <- mfcas } else { est2 <- auto.arima(data[,i]) fcas2 <- forecast(est, h=5)$mean lst[[i]][["estimates_full"]] <- est2 lst[[i]][["forecast_full"]] <- fcas2 } } return(lst) } batch(ddat)
This isn’t the prettiest code, but it gets the job done. Note that lst was populated within a function and won’t be available in the global environment. Instead, I chose to simply print out the contents of the list after the function is evaluated.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.