«smooth» package for R. Intermittent state-space model. Part I. Introducing the model
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Intro
One of the features of functions of smooth package is the ability to work with intermittent data and the data with periodically occurring zeroes.
Intermittent time series is a series that has non-zero values occurring at irregular frequency (Svetuknov and Boylan, 2017). Imagine retailer who sells green leap sticks. The demand on such a product will not be easy to predict, because green colour is not a popular colour in this case, and thus the data of sales will contain a lot of zeroes with seldom non-zero values. Such demand is called “intermittent”. In fact, many products exhibit intermittent patterns in sales, especially when we increase the frequency of measurement (how many tomatoes and how often does a store sell per day? What about per hour? Per minute?).
The other case is when watermelons are sold in high quantities over summer and are either not sold at all or sold very seldom in winter. In this case the demand might be intermittent or even absent in winter and have a nature of continuous demand during summer.
smooth functions can work with both of these types of data, building upon mixture distributions.
In this post we will discuss the basic intermittent demand statistical models implemented in the package.
The model
First, it is worth pointing out that the approach that is used in the statistical methods and models discussed in this post assumes that the final demand on the product can be split into two parts (Croston, 1972):
- Demand occurrence part, which is represented by a binary variable, which is equal to one when there is a non-zero demand in period \(t\) and zero otherwise;
- Demand sizes part, which reflects the amount of sold product when demand occurrence part is equal to one.
This can be represented mathematically by the following equation:
\begin{equation} \label{eq:iSS}
y_t = o_t z_t ,
\end{equation}
where \(o_t\) is the binary demand occurrence variable, \(z_t\) is the demand sizes variable and \(y_t\) is the final demand. This equation was originally proposed by Croston, (1972), although he has never considered the complete statistical model and only proposed a forecasting method.
There are several intermittent demand methods that are usually discussed in forecasting literature: Croston (Croston, 1972), SBA (Syntetos & Boylan, 2000) and TSB (Teunter et al., 2011). These are good methods that work well in intermittent demand context (see, for example, Kourentzes, 2014). The only limitation is that they are the “methods” and not “models”. Having models becomes important when you want to include additional components and produce proper prediction intervals, need the ability to select the appropriate components and do proper statistical inference. So, John Boylan and I developed the model underlying these methods (Svetunkov & Boylan, 2017), based on \eqref{eq:iSS}. It is built upon ETS framework, so we call it “iETS”. Given that all the intermittent demand forecasting methods rely on simple exponential smoothing (SES), we suggested to use ETS(M,N,N) model for both demand sizes and demand occurrence parts, because it underlies SES (Hyndman et al., 2008). One of the key assumptions in our model is that demand sizes and demand occurrences are independent of each other. Although, this is an obvious simplification, it is inherited from Croston and TSB, and seems to work well in many contexts.
The iETS(M,N,N) model, discussed in our paper is formulated the following way:
\begin{equation}
\begin{matrix}
y_t = o_t z_t \\
z_t = l_{z,t-1} \left(1 + \epsilon_t \right) \\
l_{z,t} = l_{z,t-1}( 1 + \alpha_z \epsilon_t) \\
o_t \sim \text{Bernoulli}(p_t)
\end{matrix} ,
\end{equation}
where \(z_t\) is represented by the ETS(M,N,N) model, \(l_{z,t}\) is the level of demand sizes, \(\alpha_z\) is the smoothing parameter and \(\epsilon_t\) is the error term. The important assumption in our implementation of the model is that \(\left(1 + \epsilon_t \right) \sim \text{log}\mathcal{N}(0, \sigma_\epsilon^2) \) – something that we discussed in one of the previous posts. This means that the demand will always be positive. However if you deal with some other type of data, where negative values are natural, then you might want to stick with pure additive model.
As a side note, we have advanced notations for the iETS models, which will be discuss in the next post. For now we will stick with the level models and use the shorter names.
Having this statistical model, makes it extendable, so that one can add trend, seasonal component, or exogenous variables. We don’t discuss these elements in our paper, but it is briefly mentioned in the conclusions. And we don’t discuss these features just yet, we will cover them in the next post.
Now the main question that stays unanswered is how to model the probability \(p_t\). And there are several approaches to that:
- iETS\(_f\) – assume that the demand occurs at random with a fixed probability (so \(p_t = p\)).
- iETS\(_i\) – the interval-based model, with a principle suggested by Croston (1972). In this case we assume that the probability is constant between the demand occurrences and that the intervals between the occurrences are inverse proportional to the respective probabilities in time. In this case we model and forecast the number of zeroes between the non-zero demands and then invert the forecasted value. This is one of the oldest forecasting approaches for intermittent demand, but it is not always the most accurate one. Still it seems that this model works well, when demand is building up.
- iETS\(_p\) – the probability-based model, the principle suggested by Teunter et al., (2011). In this case the probability is updated directly based on the values of occurrence variable, using SES method. This model works well in any case when probability changes over time.
In case (1) there is no model underlying the probability, we just estimate the value and produce forecasts. In cases of (2) and (3), we suggest using another ETS(M,N,N) model as underlying each of these processes. So when it comes to producing forecasts, in both cases we assume that future level of probability will be the same as the last obtained one (level forecast from the local-level model). After that the final forecast is generated using:
\begin{equation} \label{eq:iSSForecast}
\hat{y}_t = \hat{p}_t \hat{z}_t ,
\end{equation}
where \(\hat{p}_t\) is the forecast of the probability, \(\hat{z}_t\) is the forecast of demand sizes and \(\hat{y}_t\) is the final forecast of the intermittent demand.
Summarising advantages of our framework:
- Our model is extendable: you can use any ETS model and even introduce exogenous variables. In fact, you can use any model you want for demand sizes and a wide variety of models for demand occurrence variable;
- The model allows selecting between the aforementioned types of intermittent models (“fixed” / “probability” / “intervals”) using information criteria. This mechanism works fine on large samples, but, unfortunately, does not seem to work well in cases of small samples (which is typical for intermittent demand) with the existing information criteria. Potentially, some modifications of criteria need to be done in order to make the mechanism work better. For example, AICc and BICc need to be adjusted in order to take the demand occurrence part into account;
- The model allows producing prediction intervals for several steps ahead and cumulative (over a lead time) upper bound of the intervals. The latter arises naturally from the model and can be used for safety stock calculation;
- The estimation of models is done using likelihood function and not some ad-hoc estimators. This means that the estimates of parameters become efficient and consistent;
- Although the proposed model is continuous, we show in our paper that it is more accurate than several other integer-valued models. Still, if you want to have integer numbers as your final forecasts, you can round up or round down either the point or prediction intervals, ending up with meaningful values. This can be done due to a connection between the quantiles of rounded values and the rounding of quantiles of continuous variable (discussed in Appendix of our paper).
If you need more details, have a look at our working paper (have I already advertise it enough in this post?).
Implementation. Demand occurrence
The aforementioned model with different occurrence types is available in
smoothpackage. There is a special function for demand occurrence part, called
iss()(Intermittent State-Space model) and there is a parameter in every smooth forecasting function called
intermittent, which can be one of: “none”, “fixed”, “interval”, “probability”, “sba”, “auto” or “logistic”. We do not cover “logistic” yet, this will be discussed in the next post. We also don’t discuss “sba” (it is based on Syntetos and Boylan, 2001) and “auto” options – we might cover them later at some point.
So, let’s consider an example with artificial data. We create the following time series:
x <- c(rpois(25,5),rpois(25,1),rpois(25,0.5),rpois(25,0.1))
This way we have an artificial data, where both demand sizes and demand occurrence probability decrease over time stepwise, each 25 observations. The generated data resembles something called “demand obsolescence” or “dying out demand”. Let’s fit the three different models to this time series:
issFixed <- iss(x, intermittent="f", h=25)
Intermittent state space model estimated: Fixed probability Underlying ETS model: MNN Vector of initials: level 0.55 Information criteria: AIC AICc BIC BICc 139.6278 139.6686 142.2329 142.3269
issInterval <- iss(x, intermittent="i", h=25)
Intermittent state space model estimated: Interval-based Underlying ETS model: MNN Smoothing parameters: alpha 0.13 Vector of initials: level 1.008 Information criteria: AIC AICc BIC BICc 127.4387 127.6887 135.2542 135.8299
issProbability <- iss(x, intermittent="p", h=25)
Intermittent state space model estimated: Probability-based Underlying ETS model: MNN Smoothing parameters: alpha 0.101 Vector of initials: level 0.892 Information criteria: AIC AICc BIC BICc 101.4430 101.5667 106.6534 106.9382
By looking at the outputs, we can already say that iETS\(_p\) model performs better than the others in terms of information criteria. This is due to the flexibility of the model mentioned before – it is able to adapt to the changes in probability faster than the other models. Both iETS\(_p\) and iETS\(_i\) have smoothing parameter close to 0.1 and both start from the high probability (in case of iETS\(_i\) initial level is 1.008, which transforms to the probability of \(\frac{1}{1.008} \approx 0.99 \).
We can also plot the actual occurrence variable, fitted and forecasted probabilities using plot function:
plot(issFixed)
plot(issInterval)
plot(issProbability)
Note that the different models capture the probability differently. While iETS\(_f\) averages out the probability, both iETS\(_i\) and iETS\(_p\) models react to the changes in the data, but differently: interval-based model does that only when demand occurs, while probability-based one does that on every observation. Given that in this example, the demand becomes obsolete, neither the iETS\(_f\), nor iETS\(_i\) produce accurate forecasts for the occurrence part.
In addition, in my example the last observation in x is non-zero demand, so both iETS\(_i\) and iETS\(_p\) react to that, each of them slightly differently: if there was a zero, iETS\(_i\) would predict a higher level of occurrence, while iETS\(_p\) would predict the lower one. This is due to the differences in the mechanisms of the probability update in the two models.
We cannot do much with the occurrence part of the model at the moment, because of the limitations that will be discussed in the next post. For now we are stuck with ETS(M,N,N) model there, so, let’s move to the demand sizes part of the model.
Implementation. The whole demand
In order to deal with the intermittent data and produce the forecasts for the whole time series, we can use either
es(), or
ssarima(), or
ces(), or
gum()– all of them have the parameter
intermittent, which is equal to “none” by default. We will use an example of ETS models. In order to simplify things we will use iETS\(_p\) model:
es(x, "MNN", intermittent="p", silent=FALSE, h=25)
The forecast of this model is a straight line, close to zero due to the decrease in both demand sizes and demand occurrence parts. However, knowing that the demand decreases, we can use trend model in this case. And the flexibility of the approach allows us doing that, so we fit ETS(M,M,N) to demand sizes:
es(x, "MMN", intermittent="p", silent=FALSE, h=25)
The forecast in this case is even closer to zero and reaches it asymptotically, which means that we foresee that the demand on our product will on average die out.
We can also produce prediction intervals and use model selection for demand sizes. If you know that the data cannot be negative (e.g. selling tomatoes in kilograms), then I would recommend using the pure multiplicative model:
es(x, "YYN", intermittent="p", silent=FALSE, h=25, intervals=TRUE)
Forming the pool of models based on... MNN, MMN, Estimation progress: ... Done! Time elapsed: 0.2 seconds Model estimated: iETS(MNN) Intermittent model type: Probability-based Persistence vector g: alpha 0.524 Initial values were optimised. 5 parameters were estimated in the process Residuals standard deviation: 0.551 Cost function type: MSE; Cost function value: 1.475 Information criteria: AIC AICc BIC BICc 284.9088 285.3794 294.9455 287.8737 95% parametric prediction intervals were constructed
Compare this graph with the one when the pure additive model is selected:
es(x, "XXN", intermittent="p", silent=FALSE, h=25, intervals=TRUE)
Forming the pool of models based on... ANN, AAN, Estimation progress: ... Done! Time elapsed: 0.14 seconds Model estimated: iETS(ANN) Intermittent model type: Probability-based Persistence vector g: alpha 0.448 Initial values were optimised. 5 parameters were estimated in the process Residuals standard deviation: 1.749 Cost function type: MSE; Cost function value: 2.893 Information criteria: AIC AICc BIC BICc 321.9607 322.4313 331.9974 324.9256 95% parametric prediction intervals were constructed
In the latter case the prediction intervals cover the negative part of the plane which does not make sense in our context. Note also that the information criteria are lower for the multiplicative model, which is due to the changing variance in sample for each 25 observations.
The important thing to note is that although multiplicative trend model sounds solid from the theoretical point of view (it cannot produce negative values), it might be dangerous in cases of small samples and positive trends. In this situation the model can produce exploding trajectory, because the forecast corresponds to the exponent. I don’t have any universal solution for this problem for the moment, but I would recommend using ETS(M,Md,N) (damped multiplicative trend) model instead of ETS(M,M,N). The reason why I don’t recommend ETS(M,A,N), is because in cases of negative trend, with the typically low level of intermittent demand, the updated level might become negative, thus making model inapplicable to the data.
Finally, the experiments that I have conducted so far show that iETS\(_p\) is the most robust intermittent demand model in many cases. This is due to the flexibility of the probability update mechanism, proposed by Teunter et al. (2011).
Still, there is a question: how to make the model even more flexible, so that the occurrence part would be as versatile as the demand sizes part of the iETS model? We will have an answer to this question in the next post…
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.