[This article was first published on Freakonometrics - Tag - R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Splines in regression is something which looks like a black box (or
maybe like some dishes you get when you travel away from home: it tastes
good, but you don’t what’s inside… even if you might have some clues,
you never know for sure*). Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
That sounds nice, but when you look at the output of the regression… you got figures, but you barely see how to interpret them… So let us have a look at the box, and I mean what is inside that box…
> library(splines)
> K=c(14,20)
> plot(cars)
> reg=lm(dist~bs(speed,knots=c(K),
+ degree=2),data=cars)
> u=seq(4,25,by=.1)
> B=data.frame(speed=u)
> Y=predict(reg,newdata=B)
> lines(u,Y,lwd=2,col=”red”)
I.e. we have the following (nice) picture,
> summary(reg)
Call:
lm(formula = dist ~ bs(speed, knots = c(K), degree = 2), data = cars)
Residuals:
Min 1Q Median 3Q Max
-21.848 -8.702 -1.667 6.508 42.514
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.991 9.692 0.618 0.539596
bs(speed, knots = c(K), degree = 2)1 8.087 15.278 0.529 0.599174
bs(speed, knots = c(K), degree = 2)2 45.540 10.735 4.242 0.000109 ***
bs(speed, knots = c(K), degree = 2)3 49.789 15.704 3.170 0.002738 **
bs(speed, knots = c(K), degree = 2)4 95.550 13.651 7.000 1.02e-08 ***
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 15 on 45 degrees of freedom
Multiple R-squared: 0.6888, Adjusted R-squared: 0.6611
F-statistic: 24.89 on 4 and 45 DF, p-value: 6.49e-11
So, what can we do with those numbers ?
First, assume know that we consider only one knot (we have to start somewhere), and we consider a b-spline interpolation of degree 1 (i.e. linear by parts).
> K=c(14)
> reg=lm(dist~bs(speed,knots=c(K),degree=1),data=cars)
> summary(reg)
Call:
lm(formula = dist ~ bs(speed, knots = c(K), degree = 1), data = cars)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.290 7.746 0.425 0.6730
bs(speed, knots = c(K), degree = 1)1 31.483 9.670 3.256 0.0021 **
bs(speed, knots = c(K), degree = 1)2 80.525 9.038 8.910 1.16e-11 ***
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 15.41 on 47 degrees of freedom
Multiple R-squared: 0.657, Adjusted R-squared: 0.6424
F-statistic: 45.01 on 2 and 47 DF, p-value: 1.203e-11
> B=function(x,j,n,K){
+ b=0
+ a1=a2=0
+ if(((K[j+n+1]>K[j+1])&(j+n<=length(K))&(n>0))==TRUE){a2=(K[j+n+1]-x)/
+ (K[j+n+1]-K[j+1])*B(x,j+1,n-1,K) }
+ if(((K[j+n]>K[j])&(n>0))==TRUE){a1=(x-K[j])/
+ (K[j+n]-K[j])*B(x,j,n-1,K)}
+ if(n==0){ b=((x>K[j])&(x<=K[j+1]))*1 }
+ if(n>0){ b=a1+a2}
+ return(b)
+ }
So, for instance, for splines of degree 1, we have
> u=seq(0,1,by=.01)
> plot(u,B(u,1,1,c(0,.4,1,1)),lwd=2,col=”red”,type=”l”,ylim=c(0,1))
> lines(u,B(u,2,1,c(0,.4,1,1)),lwd=2,col=”blue”)
> abline(v=c(0,.4,1),lty=2)
> u=seq(0,1,by=.01)
> plot(u,B(u,1,2,c(0,0,.4,1,1,1)),lwd=2,col=”red”,type=”l”,ylim=c(0,1))
> lines(u,B(u,2,2,c(0,0,.4,1,1,1)),lwd=2,col=”blue”)
> lines(u,B(u,3,2,c(0,0,.4,1,1,1)),lwd=2,col=”green”)
> abline(v=c(0,.4,1),lty=2)
So, how do we use that, here ? Actually, there are two steps:
- we get from
to the unit interval (using a simple affine transformation) - we consider
> u0=seq(0,1,by=.01)
> v=reg$coefficients[2]*u0+reg$coefficients[1]
> x1=seq(min(cars$speed),K,length=length(u0))
> lines(x1,v,col=”green”,lwd=2)
> u0=seq(0,1,by=.01)
> v=(reg$coefficients[3]-reg$coefficients[2])*u0+ reg$coefficients[1]+reg$coefficients[2]
> x2=seq(K,max(cars$speed),length=length(u0))
> lines(x2,v,col=”blue”,lwd=2)
> plot(cars)
> reg=lm(dist~bs(speed,knots=c(K),degree=1),data=cars)
> k=(K-min(cars$speed))/(max(cars$speed)-min(cars$speed))
> u0=seq(0,1,by=.01)
> v=reg$coefficients[1]+
+ reg$coefficients[2]*B(u0,1,1,c(0,k,1,1))+
+ reg$coefficients[3]*B(u0,2,1,c(0,k,1,1))
> lines(min(cars$speed)+u0*(max(cars$speed)-min(cars$speed)),
+ v,col=”purple”,lwd=2)
> abline(v=K,lty=2,col=”red”)
> K=c(14,20)
> plot(cars)
> reg=lm(dist~bs(speed,knots=c(K),degree=1),data=cars)
> u=seq(4,25,by=.1)
> B=data.frame(speed=u)
> Y=predict(reg,newdata=B)
> lines(u,Y,lwd=2,col=”red”)
> abline(v=K,lty=2,col=”red”)
First, we can plot our basis functions, with two knots,
> u=seq(0,1,by=.01)
> plot(u,B(u,1,1,c(0,.4,.7,1)),lwd=2,col=”red”,type=”l”,ylim=c(0,1))
> lines(u,B(u,2,1,c(0,.4,.7,1,1)),lwd=2,col=”blue”)
> lines(u,B(u,3,1,c(0,.4,.7,1,1)),lwd=2,col=”green”)
> abline(v=c(0,.4,.7,1),lty=2)
> plot(cars)
> reg=lm(dist~bs(speed,knots=c(K),degree=1),data=cars)
> k=(K-min(cars$speed))/(max(cars$speed)-min(cars$speed))
> u0=seq(0,1,by=.01)
> v=reg$coefficients[1]+
+ reg$coefficients[2]*B(u0,1,1,c(0,k,1,1))+
+ reg$coefficients[3]*B(u0,2,1,c(0,k,1,1))+
+ reg$coefficients[4]*B(u0,3,1,c(0,k,1,1))
> lines(min(cars$speed)+u0*(max(cars$speed)-min(cars$speed)),
+ v,col=”red”,lwd=2)
> abline(v=K,lty=2,col=”red”)
> u=seq(0,1,by=.01)
> plot(u,B(u,1,2,c(0,0,.4,.7,1,1,1)),lwd=2,col=”red”,type=”l”,ylim=c(0,1))
> lines(u,B(u,2,2,c(0,0,.4,.7,1,1,1)),lwd=2,col=”blue”)
> lines(u,B(u,3,2,c(0,0,.4,.7,1,1,1)),lwd=2,col=”green”)
> lines(u,B(u,4,2,c(0,0,.4,.7,1,1,1)),lwd=2,col=”orange”)
> abline(v=c(0,.4,.7,1),lty=2)
> plot(cars)
> reg=lm(dist~bs(speed,knots=c(K),degree=2),data=cars)
> k=(K-min(cars$speed))/(max(cars$speed)-min(cars$speed))
> u0=seq(0,1,by=.01)
> v=reg$coefficients[1]+
+ reg$coefficients[2]*B(u0,1,2,c(0,0,k,1,1,1))+
+ reg$coefficients[3]*B(u0,2,2,c(0,0,k,1,1,1))+
+ reg$coefficients[4]*B(u0,3,2,c(0,0,k,1,1,1))+
+ reg$coefficients[5]*B(u0,4,2,c(0,0,k,1,1,1))
> lines(min(cars$speed)+u0*(max(cars$speed)-min(cars$speed)),
+ v,col=”purple”,lwd=2)
> abline(v=K,lty=2,col=”red”)
> vk=seq(.05,.95,by=.05)
> SSR=matrix(NA,length(vk))
> for(i in 1:(length(vk))){
+ k=vk[i]
+ K=min(cars$speed)+k*(max(cars$speed)-min(cars$speed))
+ reg=lm(dist~bs(speed,knots=c(K),degree=2),data=cars)
+ SSR[i]=sum(residuals(reg)^2)
+ }
> plot(vk,SSR,type=”b”,col=”blue”)
To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics - Tag - R-english.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.