Site icon R-bloggers

An Update on Boosting with Splines

[This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In my previous post, An Attempt to Understand Boosting Algorithm(s), I was puzzled by the boosting convergence when I was using some spline functions (more specifically linear by parts and continuous regression functions). I was using

> library(splines)
> fit=lm(y~bs(x,degree=1,df=3),data=df)

The problem with that spline function is that knots seem to be fixed. The iterative boosting algorithm is

then the strategy is to model those residuals

and to loop. Then set

I thought that boosting would work well if at step , it was possible to change the knots. But the output

was quite disappointing: boosting does not improve the prediction here. And it looks like knots don’t change. Actually, if we select the ‘best‘ knots, the output is much better. The dataset is still

> n=300
> set.seed(1)
> u=sort(runif(n)*2*pi)
> y=sin(u)+rnorm(n)/4
> df=data.frame(x=u,y=y)

For an optimal choice of knot locations, we can use

> library(freeknotsplines)
> xy.freekt=freelsgen(df$x, df$y, degree = 1, 
+ numknot = 2, 555)

The code of the previous post can simply be updated

> v=.05
> library(splines)
> xy.freekt=freelsgen(df$x, df$y, degree = 1, 
+ numknot = 2, 555)
> fit=lm(y~bs(x,degree=1,knots=
+ xy.freekt@optknot),data=df)
> yp=predict(fit,newdata=df)
> df$yr=df$y - v*yp
> YP=v*yp
>  for(t in 1:200){
+    xy.freekt=freelsgen(df$x, df$yr, degree = 1,
+    numknot = 2, 555)
+ fit=lm(yr~bs(x,degree=1,knots=
+     xy.freekt@optknot),data=df)
+    yp=predict(fit,newdata=df)
+    df$yr=df$yr - v*yp
+    YP=cbind(YP,v*yp)
+  }
>  nd=data.frame(x=seq(0,2*pi,by=.01))
>  viz=function(M){
+    if(M==1)  y=YP[,1]
+    if(M>1)   y=apply(YP[,1:M],1,sum)
+    plot(df$x,df$y,ylab="",xlab="")
+    lines(df$x,y,type="l",col="red",lwd=3)
+    fit=lm(y~bs(x,degree=1,df=3),data=df)
+    yp=predict(fit,newdata=nd)
+    lines(nd$x,yp,type="l",col="blue",lwd=3)
+    lines(nd$x,sin(nd$x),lty=2)}
 
>  viz(100)

I like that graph. I had the intuition that using (simple) splines would be possible, and indeed, we get a very smooth prediction.

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics » R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.