Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Makridakis Competitions (also known as the M Competitions or M-Competitions) are a series of competitions organized by teams led by forecasting researcher Spyros Makridakis and intended to evaluate and compare the accuracy of different forecasting methods. So far three competitions have taken place, named as M1 (1982), M2 (1993) and M3 (2000). The fourth competition is going to take place on year 2018 very soon.
The Mcomp package provides with the set of time series part of the M1 and M3 competitions, specifically 1001 time series from the M1 competition and 3003 from the M3 one.
Packages
suppressPackageStartupMessages(library(Mcomp)) suppressPackageStartupMessages(library(ggplot2)) suppressPackageStartupMessages(library(gridExtra)) suppressPackageStartupMessages(library(Kmisc))
Analysis
The data is provided by the M1 and M3 lists and their summary can be easily printed out.
data(M1) data(M3) print(M1) M-Competition data: 1001 time series Type of data Period DEMOGR INDUST MACRO1 MACRO2 MICRO1 MICRO2 MICRO3 Total MONTHLY 75 183 64 92 10 89 104 617 QUARTERLY 39 18 45 59 5 21 16 203 YEARLY 30 35 30 29 16 29 12 181 Total 144 236 139 180 31 139 132 1001 print(M3) M-Competition data: 3003 time series Type of data Period DEMOGRAPHIC FINANCE INDUSTRY MACRO MICRO OTHER Total MONTHLY 111 145 334 312 474 52 1428 OTHER 0 29 0 0 4 141 174 QUARTERLY 57 76 83 336 204 0 756 YEARLY 245 58 102 83 146 11 645 Total 413 308 519 731 828 204 3003
Monthly, quarterly, yearly and some other time aggregation are available (row names) splitted among a pool of data categories (column names).
Each member of the M1 or M3 list is a Mdata object providing with:
-
sn, name of the series
st, series number and period. For example \”Y1\” denotes first yearly series, \”Q20\” denotes 20th quarterly series and so on
n, the number of observations in the time series
h, the number of required forecasts period Interval of the time series.
Possible values are “YEARLY”, “QUARTERLY”, “MONTHLY” & “OTHER”
type, the type of series.
Possible values for M1 are “DEMOGR”, “INDUST”, “MACRO1”, “MACRO2”, “MICRO1”, “MICRO2” & “MICRO3”.
Possible values for M3 are “DEMOGRAPHIC”, “FINANCE”, “INDUSTRY”, “MACRO”, “MICRO”, “OTHER”
description, a short description of the time series
x, a time series of length n (the historical data)
xx, a time series of length h (the future data)
Let us have a look at M1 and M3 first ten elements names.
names(M1)[1:10] [1] "YAF2" "YAF3" "YAF4" "YAF5" "YAF6" "YAF7" "YAF8" "YAF9" "YAF10" "YAF11" names(M3)[1:10] [1] "N0001" "N0002" "N0003" "N0004" "N0005" "N0006" "N0007" "N0008" "N0009" "N0010"
Each element is a Mdata S3 class instance. For example:
ts_1 <- M3[["N0001"]] print(ts_1) Series: Y1 Type of series: MICRO Period of series: YEARLY Series description: SALES ( CODE= ABT) HISTORICAL data Time Series: Start = 1975 End = 1988 Frequency = 1 [1] 940.66 1084.86 1244.98 1445.02 1683.17 2038.15 2342.52 2602.45 2927.87 3103.96 3360.27 [12] 3807.63 4387.88 4936.99 FUTURE data Time Series: Start = 1989 End = 1994 Frequency = 1 [1] 5379.75 6158.68 6876.58 7851.91 8407.84 9156.01
Plots can be achieved by plot() or autoplot() functions, as shown below. The future horizon time series data is plot in red color.
idx <- sample(1:length(M3), 12) par(mfrow=c(4,3)) invisible(lapply(idx, function(x) { plot(M3[[x]])})) par(mfrow=c(1,1))
plot_list <- invisible(lapply(idx, function(x) { autoplot(M3[[x]])})) grob_plots <- invisible(lapply(chunk(1, length(plot_list), 12), function(x) { marrangeGrob(grobs=lapply(plot_list[x], ggplotGrob), nrow=4, ncol=3)})) grob_plots
We can figure out an helper function that allows the comparison of accuracy and shows resulting plots for the forecasts based on two specified fit methods and parameters. The horizon future is based on the length of future available data. The accuracy is determined by the accuracy() function provided within the forecast package (see refs [4] and [5]).
prepare_arg_list <- function(fit_func, x) { func_sig <- names(formals(fit_func)) arg_list <- list() if ("y" %in% func_sig) { arg_list <- list(y=x) } else if ("data" %in% func_sig) { arg_list <- list(data=x) } else if ("x" %in% func_sig) { arg_list <- list(x=x) } arg_list } analyze_forecast <- function(Mobj, ts_idx, fit_func1, args1 = NULL, fit_func2, args2 = NULL) { ts <- Mobj[[ts_idx]] x <- ts$x xx <- ts$xx h <- length(xx) arg_list1 <- prepare_arg_list(fit_func1, x) model_1 <- do.call(fit_func1, args=c(arg_list1, args1)) ts_fc_1 <- forecast(model_1, h=h) arg_list2 <- prepare_arg_list(fit_func2, x) model_2 <- do.call(fit_func2, args=c(arg_list2, args2)) ts_fc_2 <- forecast(model_2, h=h) cat(paste("Accuracy Forecast Method:", substitute(fit_func1), "\n")) print(accuracy(ts_fc_1, xx)) cat("\n") cat(paste("Accuracy Forecast Method:", substitute(fit_func2), "\n")) print(accuracy(ts_fc_2, xx)) par(mfrow=c(1,2)) plot(ts_fc_1) lines(xx, col ='red') plot(ts_fc_2) lines(xx, col='red') par(mfrow=c(1,1)) list(fc1=ts_fc_1, fc2=ts_fc_2) }
Then we can use it to compare forecast method for a given time series, future horizon and forecast methods. Some examples follows.
Herein below, we compare forecasts generated by ets (exponential smoothing state space model) vs ARIMA auto.arima() function model found. The stepwise = FALSE parameter is passed as input to auto.arima() to allow for a more exhaustive search of candidate models. Both functions are available within forecast package.
analyze_forecast(M3, 1, ets, "ZZZ", auto.arima, list(stepwise=FALSE)) Accuracy Forecast Method: ets ME RMSE MAE MPE MAPE MASE Training set 15.68064 97.95396 81.58716 -0.1594235 3.688927 0.2654018 Test set 445.03563 577.31243 480.61641 5.3426543 6.004038 1.5634378 ACF1 Theil's U Training set 0.1011023 NA Test set 0.4859372 0.7108304 Accuracy Forecast Method: auto.arima ME RMSE MAE MPE MAPE MASE ACF1 Training set 28.88508 88.49504 68.3793 1.096870 2.439216 0.2224368 0.0146484 Test set 446.25333 578.60264 481.7033 5.358426 6.017378 1.5669735 0.4860139 Theil's U Training set NA Test set 0.712472 $fc1 Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 1989 5486.492 5164.145 5808.839 4993.505 5979.479 1990 6035.932 5300.681 6771.184 4911.463 7160.402 1991 6585.373 5324.965 7845.780 4657.745 8513.000 1992 7134.813 5242.131 9027.495 4240.205 10029.420 1993 7684.253 5053.545 10314.961 3660.932 11707.574 1994 8233.693 4758.261 11709.125 2918.478 13548.908 $fc2 Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 1989 5486.10 5363.602 5608.598 5298.756 5673.444 1990 6035.21 5761.297 6309.123 5616.295 6454.125 1991 6584.32 6125.975 7042.665 5883.342 7285.298 1992 7133.43 6462.482 7804.378 6107.303 8159.557 1993 7682.54 6774.072 8591.008 6293.158 9071.922 1994 8231.65 7063.095 9400.205 6444.500 10018.800
We compare forecast from stlm() with ARIMA model vs stlm() with ets model. The stlm() function is provided within the forecast package. It takes a time series y, applies an STL decomposition, and models the seasonally adjusted data using the model passed as modelfunction or specified using method, in our case arima and ets. The stlm() function returns an object that includes the original STL decomposition and a time series model fitted to the seasonally adjusted data. This object can be passed to the forecast.stlm for forecasting.
analyze_forecast(M3, 700, stlm, list(method=c("arima")), stlm, list(method=c("ets"))) Accuracy Forecast Method: stlm ME RMSE MAE MPE MAPE MASE ACF1 Theil's U Training set -11.90287 402.5976 252.8155 -0.4288085 4.232485 0.3940372 0.1205458 NA Test set 792.38609 863.2711 792.3861 11.8446454 11.844645 1.2350097 0.2029283 1.993663 Accuracy Forecast Method: stlm ME RMSE MAE MPE MAPE MASE ACF1 Theil's U Training set -11.42326 402.6201 253.3089 -0.4211643 4.240385 0.3948063 0.1206962 NA Test set 792.33085 863.2204 792.3309 11.8438007 11.843801 1.2349236 0.2029283 1.993547 $fc1 Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 1993 Q1 5799.191 5275.923 6322.459 4998.921 6599.461 1993 Q2 5796.247 5056.234 6536.260 4664.494 6928.000 1993 Q3 5838.818 4932.490 6745.145 4452.709 7224.926 1993 Q4 5624.900 4578.363 6671.437 4024.360 7225.440 1994 Q1 5799.191 4629.127 6969.255 4009.733 7588.649 1994 Q2 5796.247 4514.506 7077.988 3835.994 7756.500 1994 Q3 5838.818 4454.380 7223.256 3721.502 7956.133 1994 Q4 5624.900 4144.874 7104.926 3361.395 7888.405 $fc2 Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 1993 Q1 5799.246 5340.654 6257.838 5097.890 6500.602 1993 Q2 5796.302 5147.163 6445.441 4803.530 6789.075 1993 Q3 5838.873 5043.090 6634.656 4621.828 7055.917 1993 Q4 5624.955 4705.186 6544.724 4218.290 7031.621 1994 Q1 5799.246 4769.926 6828.566 4225.037 7373.456 1994 Q2 5796.302 4667.653 6924.951 4070.183 7522.422 1994 Q3 5838.873 4618.618 7059.128 3972.654 7705.092 1994 Q4 5624.955 4319.189 6930.721 3627.958 7621.952
We then compare forecasts based on stlm() with ARIMA vs. ets() ones.
analyze_forecast(M3, 700, stlm, list(method=c("arima")), ets) Accuracy Forecast Method: stlm ME RMSE MAE MPE MAPE MASE ACF1 Theil's U Training set -11.90287 402.5976 252.8155 -0.4288085 4.232485 0.3940372 0.1205458 NA Test set 792.38609 863.2711 792.3861 11.8446454 11.844645 1.2350097 0.2029283 1.993663 Accuracy Forecast Method: ets ME RMSE MAE MPE MAPE MASE ACF1 Theil's U Training set -38.53996 477.8266 314.552 -0.9081287 5.189619 0.4902595 0.02021015 NA Test set 932.19799 990.8932 932.198 13.9795463 13.979546 1.4529200 0.25180619 2.285006 $fc1 Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 1993 Q1 5799.191 5275.923 6322.459 4998.921 6599.461 1993 Q2 5796.247 5056.234 6536.260 4664.494 6928.000 1993 Q3 5838.818 4932.490 6745.145 4452.709 7224.926 1993 Q4 5624.900 4578.363 6671.437 4024.360 7225.440 1994 Q1 5799.191 4629.127 6969.255 4009.733 7588.649 1994 Q2 5796.247 4514.506 7077.988 3835.994 7756.500 1994 Q3 5838.818 4454.380 7223.256 3721.502 7956.133 1994 Q4 5624.900 4144.874 7104.926 3361.395 7888.405 $fc2 Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 1993 Q1 5624.977 5120.941 6129.013 4854.121 6395.833 1993 Q2 5624.977 4911.328 6338.626 4533.545 6716.409 1993 Q3 5624.977 4749.886 6500.068 4286.640 6963.314 1993 Q4 5624.977 4613.281 6636.673 4077.722 7172.232 1994 Q1 5624.977 4492.488 6757.466 3892.984 7356.970 1994 Q2 5624.977 4382.881 6867.073 3725.355 7524.599 1994 Q3 5624.977 4281.718 6968.236 3570.640 7679.314 1994 Q4 5624.977 4187.213 7062.741 3426.107 7823.847
We compare forecasts from forecast package tslm() function allowing for trend and seasonality regression and forecasts generated by ets() function.
analyze_forecast(M3, 700, tslm, list(formula = x ~ trend + season), ets) Accuracy Forecast Method: tslm ME RMSE MAE MPE MAPE MASE ACF1 Theil's U Training set -5.050281e-14 614.945 535.8089 -0.9563518 8.701461 0.8351095 0.7369323 NA Test set 1.260770e+03 1313.898 1260.7700 19.0010624 19.001062 1.9650310 0.2691452 3.065128 Accuracy Forecast Method: ets ME RMSE MAE MPE MAPE MASE ACF1 Theil's U Training set -38.53996 477.8266 314.552 -0.9081287 5.189619 0.4902595 0.02021015 NA Test set 932.19799 990.8932 932.198 13.9795463 13.979546 1.4529200 0.25180619 2.285006 $fc1 Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 1993 Q1 5425.700 4469.084 6382.316 3935.752 6915.648 1993 Q2 5448.222 4491.606 6404.839 3958.275 6938.170 1993 Q3 5441.478 4484.861 6398.094 3951.530 6931.425 1993 Q4 5176.700 4220.084 6133.316 3686.752 6666.648 1994 Q1 5272.460 4297.971 6246.949 3754.676 6790.244 1994 Q2 5294.982 4320.494 6269.471 3777.198 6812.766 1994 Q3 5288.238 4313.749 6262.726 3770.454 6806.022 1994 Q4 5023.460 4048.971 5997.949 3505.676 6541.244 $fc2 Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 1993 Q1 5624.977 5120.941 6129.013 4854.121 6395.833 1993 Q2 5624.977 4911.328 6338.626 4533.545 6716.409 1993 Q3 5624.977 4749.886 6500.068 4286.640 6963.314 1993 Q4 5624.977 4613.281 6636.673 4077.722 7172.232 1994 Q1 5624.977 4492.488 6757.466 3892.984 7356.970 1994 Q2 5624.977 4382.881 6867.073 3725.355 7524.599 1994 Q3 5624.977 4281.718 6968.236 3570.640 7679.314 1994 Q4 5624.977 4187.213 7062.741 3426.107 7823.847
Further, the Mcomp package makes available forecasts produced by M3 competitors and benchmarks as provided by the M3Forecast list. Hereinbelow the name of M3 competitors and benchmarks.
names(M3Forecast) [1] "NAIVE2" "SINGLE" "HOLT" "DAMPEN" "WINTER" "COMB S-H-D" [7] "B-J auto" "AutoBox1" "AutoBox2" "AutoBox3" "ROBUST-Trend" "ARARMA" [13] "Auto-ANN" "Flors-Pearc1" "Flors-Pearc2" "PP-Autocast" "ForecastPro" "SMARTFCS" [19] "THETAsm" "THETA" "RBF" "ForcX" "AAM1" "AAM2"
A helper function allows evaluating the accuracy of those forecasts.
m3_competition_accuracy <- function(ts_idx, competitorName, plot=FALSE) { v <- na.omit(as.numeric(M3Forecast[[competitorName]][ts_idx,])) xx <- M3[[ts_idx]]$xx sn <- M3[[ts_idx]]$sn v_ts <- ts(v, frequency = frequency(xx), start = start(xx)) if (plot) { plot(M3[[ts_idx]], main = sn) legend("topleft", legend = c("Actual", competitorName), fill = c("red", "blue")) lines(v_ts, col = 'blue') } accuracy(v_ts, xx) }
Herein below some examples.
m3_competition_accuracy(150, "HOLT", TRUE) ME RMSE MAE MPE MAPE ACF1 Theil's U Test set 571.4183 2130.361 1913.418 -0.1041563 32.05945 0.5902855 1.721647
m3_competition_accuracy(150, "ROBUST-Trend", TRUE) ME RMSE MAE MPE MAPE ACF1 Theil's U Test set 923.1167 2108.68 1762.95 7.199077 27.50348 0.5838244 1.512274
Conclusions
Mcomp package represents a valuable resource for time series analysis. There is also available the Tcomp package providing with 1311 time series of class Mcomp from the tourism forecasting competition conducted in 2010.
If you have any questions, please feel free to comment below.
References
- [1] Mcomp package vignette
[2] Tcomp package vignette
[3] Makridakis Competitions
[4] Forecast package
[5] Evaluating Forecast Accuracy, Prof. Rob J. Hyndman
Related Post
- Building a Hacker News scraper with 8 lines of R code using rvest library
- Building a Telecom Dictionary scraping web using rvest in R
- Scraping Javascript-rendered web content using R
- Analysing Cryptocurrency Market in R
- Time Series Analysis in R Part 3: Getting Data from Quandl
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.