Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
When discussing transformations in regression models, I usually briefly introduce the Box-Cox transform (see e.g. an old post on that topic) and I also mention local regressions and nonparametric estimators (see e.g. another post). But while I was working on my ACT6420 course (on predictive modeling, which is a VEE for the SOA), I read something about a “Ladder of Powers Rule” also called “Tukey and Mosteller’s Bulging Rule“. To be honest, I never heard about this rule before. But that won’t be the first time I learn something while working on my notes for a course !
The point here is that, in a standard linear regression model, we have
But sometimes, a linear relationship is not appropriate. One idea can be to transform the variable we would like to model,
This is what we usually do with the Box-Cox transform. Another idea can be to transform the explanatory variable,
For instance, this year in the course, we considered – at some point – a continuous piecewise linear functions,
It is also possible to consider some polynomial regression. The ”Tukey and Mosteller’s Bulging Rule” is based on the following figure.
and the idea is that it might be interesting to transform
for some (positive) parameters
To be more specific, let us generate different models, and let us look at the associate scatterplot,
> fakedataMT=function(p=1,q=1,n=99,s=.1){ + set.seed(1) + X=seq(1/(n+1),1-1/(n+1),length=n) + Y=(5+2*X^p+rnorm(n,sd=s))^(1/q) + return(data.frame(x=X,y=Y))} > par(mfrow=c(2,2)) > plot(fakedataMT(p=.5,q=2),main="(p=1/2,q=2)") > plot(fakedataMT(p=3,q=-5),main="(p=3,q=-5)") > plot(fakedataMT(p=.5,q=-1),main="(p=1/2,q=-1)") > plot(fakedataMT(p=3,q=5),main="(p=3,q=5)")
If we consider the South-West part of the graph, to get such a pattern, we can consider
or more generally
where
Let us visualize that double transformation on a dataset, say the cars dataset.
> base=cars > names(base)=c("x","y") > MostellerTukey=function(p=1,q=1){ + regpq=lm(I(y^q)~I(x^p),data=base) + u=seq(min(min(base$x)-2,.1),max(base$x)+2,length=501) + par(mfrow=c(1,2)) + plot(base$x,base$y,xlab="X",ylab="Y",col="white") + vic=predict(regpq,newdata=data.frame(x=u),interval="prediction") + vic[vic<=0]=.1 + polygon(c(u,rev(u)),c(vic[,2],rev(vic[,3]))^(1/q),col="light blue",density=40,border=NA) + lines(u,vic[,2]^(1/q),col="blue") + lines(u,vic[,3]^(1/q),col="blue") + v=predict(regpq,newdata=data.frame(x=u))^(1/q) + lines(u,v,col="blue") + points(base$x,base$y) + + plot(base$x^p,base$y^q,xlab=paste("X^",p,sep=""),ylab=paste("Y^",q,sep=""),col="white") + polygon(c(u,rev(u))^p,c(vic[,2],rev(vic[,3])),col="light blue",density=40,border=NA) + lines(u^p,vic[,2],col="blue") + lines(u^p,vic[,3],col="blue") + abline(regpq,col="blue") + points(base$x^p,base$y^q) + }
For instance, if we call
> MostellerTukey(2,1)
we get the following graph,
On the left, we have the original dataset,
Note that here, it could have be possible to consider another transformation, with the same shape, but quite different
> MostellerTukey(1,.5)
Of course, there is no reason to consider a simple power function, and the Box-Cox transform can also be used. The interesting point is that the logarithm can be obtained as a particular case. Furthermore, it is also possible to seek optimal transformations, seen here as a pair of parameters. Consider
> p=.1 > bc=boxcox(y~I(x^p),data=base,lambda=seq(.1,3,by=.1))$y > for(p in seq(.2,3,by=.1)) bc=cbind(bc,boxcox(y~I(x^p),data=base,lambda=seq(.1,3,by=.1))$y) > vp=boxcox(y~I(x^p),data=base,lambda=seq(.1,3,by=.1))$x > vq=seq(.1,3,by=.1) > library(RColorBrewer) > blues=colorRampPalette(brewer.pal(9,"Blues"))(100) > image(vp,vq,bc,col=blues) > contour(vp,vq,bc,levels=seq(-60,-40,by=1),col="white",add=TRUE)
The darker, the better (here the log-likelihood is considered). The optimal pair is here
> bc=function(a){p=a[1];q=a[2]; as.numeric(-boxcox(y~I(x^p),data=base,lambda=q)$y[50])} > optim(c(1,1), bc,method="L-BFGS-B",lower=c(0,0),upper=c(3,3)) $par [1] 0.5758362 0.3541601 $value [1] 47.27395
and indeed, the model we get is not bad,
Fun, ins’t it?
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.