Comparing additive and multiplicative regressions using AIC in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
One of the basic things the students are taught in statistics classes is that the comparison of models using information criteria can only be done when the models have the same response variable. This means, for example, that when you have log(yt) and calculate AIC, then this value is not comparable with AIC from a model with yt. The reason for this is because the scales of variables are different. But there is a way to make the criteria in these two cases comparable: both variables need to be transformed into the original scale, and we need to understand what are the distributions of these variables in that scale. In our example, if we assume that log(yt)∼N(0,σ2l) (where σ2l is the variance of the residuals of the model in logarithms), then the exponent of this variable will have log-normal distribution:
yt∼logN(0,σ2l)
Just as a reminder, all the information criteria rely on the log-likelihood. For example, here’s the formula of AIC:
AIC=2k−2ℓ,
where k is the number of all the estimated parameters and ℓ is the value of the log-likelihood.
If we use the likelihood of log-normal distribution instead of the likelihood of the normal in (???) for the variable yt, then the information criteria will become comparable. In order to understand what needs to be done for this transformation, we need to compare the formulae for the two distributions: normal and log-normal. Here’s normal for the variable logyt:
f(yt|θ,σ2l)=1√2πσ2le−(logyt−logμt)22σ2l
and here’s the log-normal for the variable yt=exp(log(yt)) (the multiplicative model in the original scale):
f(yt|θ,σ2l)=1yt1√2πσ2le−(logyt−logμt)22σ2l,
where θ is the vector of parameters of our model. The main difference between the two distributions is in 1yt. If we derive the log-likelihood based on (???), here is what we get:
ℓ(θ,σ2l|Y)=−12(Tlog(2πσ2l)+T∑t=1(logyt−logμt)22σ2l)–T∑t=1logyt,
where Y is the vector of all the actual values in the sample. When we extract likelihood of the model in logarithms, we calculate only the first part of (???), before the “−∑Tt=1logyt”, which corresponds to the normal distribution. So, in order to produce the likelihood of the model with the variable in the original scale, we need to subtract the sum of logarithms of the response variable from the extracted likelihood.
The function
AIC()in R, applied to the model in logarithms, will extract the value based on that first part of (???). As a result, in order to fix this and get AIC in the same scale as the variable yt we need to take the remaining part into account, modifying equation (???):
AIC′=2k−2ℓ+2T∑t=1logyt=AIC+2T∑t=1logyt,
Let’s see an example in R. We will use
longleydata from
datasetspackage. First we construct additive and multiplicative models:
modelAdditive <- lm(GNP~Employed,data=longley) modelMultiplicative <- lm(log(GNP)~Employed,data=longley)
And extract the respective AICs:
AIC(modelAdditive) > 142.7824 AIC(modelMultiplicative) > -44.5661
As we see, the values are not comparable. Now let’s modify the second AIC:
AIC(modelMultiplicative) + 2*sum(log(longley$GNP)) > 145.118
So, now the values are comparable, and we can conclude that the first model (additive) is better than the second one in terms of AIC.
Similar technique can be used for the other transformed response variables (square root, Box-Cox transformation), but the respective distributions of the variables would need to be derived, which is not always a simple task.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.