Errors on percentage errors

Rob J Hyndman

8 years ago

[This article was first published on Hyndsight » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The MAPE (mean absolute percentage error) is a popular measure for forecast accuracy and is defined as

where denotes an observation and denotes its forecast, and the mean is taken over .

Armstrong (1985, p.348) was the first (to my knowledge) to point out the asymmetry of the MAPE saying that “it has a bias favoring estimates that are below the actual values”. A few years later, Armstrong and Collopy (1992) argued that the MAPE “puts a heavier penalty on forecasts that exceed the actual than those that are less than the actual”. Makridakis (1993) took up the argument saying that “equal errors above the actual value result in a greater APE than those below the actual value”. He provided an example where and , so that the relative error is 50/150=0.33, in contrast to the situation where and , when the relative error would be 50/100=0.50.

Thus, the MAPE puts a heavier penalty on negative errors (when ) than on positive errors. This is what is stated in my textbook. Unfortunately, Anne Koehler and I got it the wrong way around in our 2006 paper on measures of forecast accuracy, where we said the heavier penalty was on positive errors. We were probably thinking that a forecast that is too large is a positive error. However, forecast errors are defined as , so positive errors arise only when the forecast is too small.

To avoid the asymmetry of the MAPE, Armstrong (1985, p.348) proposed the “adjusted MAPE”, which he defined as

By that definition, the adjusted MAPE can be negative (if ), or infinite (if ), although Armstrong claims that it has a range of (0,200). Presumably he never imagined that data and forecasts can take negative values. Strangely, there is no reference to this measure in Armstrong and Collopy (1992).

Makridakis (1993) proposed almost the same measure, calling it the “symmetric MAPE” (sMAPE), but without crediting Armstrong (1985), defining it

However, in the M3 competition paper by Makridakis and Hibon (2000), sMAPE is defined equivalently to Armstrong’s adjusted MAPE (without the absolute values in the denominator), again without reference to Armstrong (1985). Makridakis and Hibon claim that this version of sMAPE has a range of (-200,200).

Flores (1986) proposed a modified version of Armstrong’s measure, defined as exactly half of the adjusted MAPE defined above. He claimed (again incorrectly) that it had an upper bound of 100.

Of course, the true range of the adjusted MAPE is as is easily seen by considering the two cases and , where , and letting . Similarly, the true range of the sMAPE defined by Makridakis (1993) is . I’m not sure that these errors have previously been documented, although they have surely been noticed.

Goodwin and Lawton (1999) point out that on a percentage scale, the MAPE is symmetric and the sMAPE is asymmetric. For example, if , then gives a 10% error, as does . Either would contribute the same increment to MAPE, but a different increment to sMAPE.

Anne Koehler (2001) in a commentary on the M3 competition, made the same point, but without reference to Goodwin and Lawton.

Whether symmetry matters or not, and whether we want to work on a percentage or absolute scale, depends entirely on the problem, so these discussions over (a)symmetry don’t seem particularly useful to me.

Chen and Yang (2004), in an unpublished working paper, defined the sMAPE as

They still called it a measure of “percentage error” even though they dropped the multiplier 100. At least they got the range correct, stating that this measure has a maximum value of two when either or is zero, but is undefined when both are zero. The range of this version of sMAPE is (0,2). Perhaps this is the definition that Makridakis and Armstrong intended all along, although neither has ever managed to include it correctly in one of their papers or books.

As will be clear by now, the literature on this topic is littered with errors. The Wikipedia page on sMAPE contains several as well, which a reader might like to correct.

If all data and forecasts are non-negative, then the same values are obtained from all three definitions of sMAPE. But more generally, the last definition above from Chen and Yang is clearly the most sensible, if the sMAPE is to be used at all. In the M3 competition, all data were positive, but some forecasts were negative, so the differences are important. However, I can’t match the published results for any definition of sMAPE, so I’m not sure how the calculations were actually done.

Personally, I would much prefer that either the original MAPE be used (when it makes sense), or the mean absolute scaled error (MASE) be used instead. There seems little point using the sMAPE except that it makes it easy to compare the performance of a new forecasting algorithm against the published M3 results. But even there, it is not necessary, as the forecasts submitted to the M3 competition are all available in the Mcomp package for R, so a comparison can easily be made using whatever measure you prefer.

Thanks to Andrey Kostenko for alerting me to the different definitions of sMAPE in the literature.

To leave a comment for the author, please follow the link and comment on their blog: Hyndsight » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.