Basic Forecasting
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Forecasting refers to the process of using statistical procedures to predict future values of a time series based on historical trends. For businesses, being able gauge expected outcomes for a given time period is essential for managing marketing, planning, and finances. For example, an advertising agency may want to utilizes sales forecasts to identify which future months may require increased marketing expenditures. Companies may also use forecasts to identify which sales persons met their expected targets for a fiscal quarter.
There are a number of techniques that can be utilized to generate quantitative forecasts. Some methods are fairly simple while others are more robust and incorporate exogenous factors. Regardless of what is utilized, the first step should always be to visualize the data using a line graph. You want to consider how the metric changes over time, whether there is a distinct trend, or if there are distinct patterns that are noteworthy.
data <- structure(c(12, 20.5, 21, 15.5, 15.3, 23.5, 24.5, 21.3, 23.5, 28, 24, 15.5, 17.3, 25.3, 25, 36.5, 36.5, 29.6, 30.5, 28, 26, 21.5, 19.7, 19, 16, 20.7, 26.5, 30.6, 32.3, 29.5, 28.3, 31.3, 32.2, 26.4, 23.4, 16.4, 15, 16, 18, 27, 21, 49, 21, 22, 28, 36, 40, 3, 21, 29, 62, 65, 46, 44, 33, 62, 22, 12, 24, 3, 5, 14, 36, 40, 49, 7, 52, 65, 17, 5, 17, 1), .Dim = c(36L, 2L), .Dimnames = list(NULL, c("Advertising", "Sales")), .Tsp = c(2006, 2008.91666666667, 12), class = c("mts", "ts", "matrix")) head(data); nrow(data) plot(data)
There are several key concepts that we should be cognizant of when describing time series data. These characteristics will inform how we pre-process the data and select the appropriate modeling technique and parameters. Ultimately, the goal is to simplify the patterns in the historical data by removing known sources of variatiion and making the patterns more consistent across the entire data set. Simpler patterns will generally lead to more accurate forecasts.
Trend: A trend exists when there is a long-term increase or decrease in the data.
Seasonality: A seasonal pattern occurs when a time series is affected by seasonal factors such as the time of the year or the day of the week.
Autocorrelation: Refers to the pheneomena whereby values of Y at time t are impacted by previous values of Y at t-i. To find the proper lag structure and the nature of auto correlated values in your data, use the autocorrelation function plot.
Stationary: A time series is said to be stationary if there is no systematic trend, no systematic change in variance, and if strictly periodic variations or seasonality do not exist
Quantitative forecasting techniques are usually based on reression analysis or time series techniques. Regression approaches examine the relationship between the forecasted variable and other explanatory variables using cross-sectional data. Time series models use hitorical data that’s been collected at regular intervals over time for the target variablle to forecast its future values. There isn’t time to cover the theory behind each of these approaches in this post, so I’ve chosen to cover high level concepts and provide code for performing time series forecasting in R. I strongly suggest understandig the statistical theory behind a technique before running the code.
First, we can use the ma function in the forecast package to perform forecasting using the moving average method. This technique estimates future values at time t by averaging values of the time series within k periods of t. When the time series is stationary, the moving average can be very effective as the observations are nearby across time.
moving_average = forecast(ma(tdat[1:31], order=2), h=5) moving_average_accuracy = accuracy(moving_average, tdat[32:36]) moving_average; moving_average_accuracy plot(moving_average, ylim=c(0,60)) lines(tdat[1:36])
The simple exponential smooting is also good when the data has no trend or seasonal patterns. Unlike a moving average, this technique gives greater weight to the most recent observations of the time series.
exp <- ses(tdat[1:31], 5, initial="simple") exp_accuracy = accuracy(exp, tdat[32:36]) exp; exp_accuracy plot(exp, ylim=c(0,60)) lines(tdat[1:36])
In the forecast package, there is an automatic forecasting function that will run through possible models and select the most appropriate model give the data. This could be an auto regressive model of the first oder (AR(1)), an ARIMA model with the right values for p, d, and q, or something else that is more appropriate.
train = tdat[1:31] test = tdat[32:36] arma_fit <- auto.arima(train) arma_forecast <- forecast(arma_fit, h = 5) arma_fit_accuracy <- accuracy(arma_forecast, test) arma_fit; arma_forecast; arma_fit_accuracy plot(arma_forecast, ylim=c(0,60)) lines(tdat[1:36])
There you go, a basic non-technical introduction to forecasting. This should get one familiar with the key concepts and how to perform some basic forecasting in R
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.