Some heuristics about spline smoothing
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Let us continue our discussion on smoothing techniques in regression. Assume that . where is some unkown function, but assumed to be sufficently smooth. For instance, assume that is continuous, that exists, and is continuous, that exists and is also continuous, etc. If is smooth enough, Taylor’s expansion can be used. Hence, for
which can also be writen as
for some ‘s. The first part is simply a polynomial.
The second part, is some integral. Using Riemann integral, observe that
for some ‘s, and some
Thus,
Nice! We have our linear regression model. A natural idea is then to consider a regression of on where
given some knots . To make things easier to understand, let us work with our previous dataset,
plot(db)
If we consider one knot, and an expansion of order 1,
attach(db) library(splines) B=bs(xr,knots=c(3),Boundary.knots=c(0,10),degre=1) reg=lm(yr~B) lines(xr[xr<=3],predict(reg)[xr<=3],col="red") lines(xr[xr>=3],predict(reg)[xr>=3],col="blue")
The prediction obtained with this spline can be compared with regressions on subsets (the doted lines)
reg=lm(yr~xr,subset=xr<=3) lines(xr[xr<=3],predict(reg)[xr<=3],col="red",lty=2) reg=lm(yr~xr,subset=xr>=3) lines(xr[xr>=3],predict(reg),col="blue",lty=2)
It is different, since we have here three parameters (and not four, as for the regressions on the two subsets). One degree of freedom is lost, when asking for a continuous model. Observe that it is possible to write, equivalently
reg=lm(yr~bs(xr,knots=c(3),Boundary.knots=c(0,10),degre=1),data=db)
So, what happened here?
B=bs(xr,knots=c(2,5),Boundary.knots=c(0,10),degre=1) matplot(xr,B,type="l") abline(v=c(0,2,5,10),lty=2)
Here, the functions that appear in the regression are the following
Now, if we run the regression on those two components, we get
B=bs(xr,knots=c(2,5),Boundary.knots=c(0,10),degre=1) matplot(xr,B,type="l") abline(v=c(0,2,5,10),lty=2)
If we add one knot, we get
the prediction is
reg=lm(yr~B) lines(xr,predict(reg),col="red")
Of course, we can choose much more knots,
B=bs(xr,knots=1:9,Boundary.knots=c(0,10),degre=1) reg=lm(yr~B) lines(xr,predict(reg),col="red")
We can even get a confidence interval
reg=lm(yr~B) P=predict(reg,interval="confidence") plot(db,col="white") polygon(c(xr,rev(xr)),c(P[,2],rev(P[,3])),col="light blue",border=NA) points(db) reg=lm(yr~B) lines(xr,P[,1],col="red") abline(v=c(0,2,5,10),lty=2)
And if we keep the two knots we chose previously, but consider Taylor’s expansion of order 2, we get
B=bs(xr,knots=c(2,5),Boundary.knots=c(0,10),degre=2) matplot(xr,B,type="l") abline(v=c(0,2,5,10),lty=2)
So, what’s going on? If we consider the constant, and the first component of the spline based matrix, we get
k=2 plot(db) B=cbind(1,B) lines(xr,B[,1:k]%*%coefficients(reg)[1:k],col=k-1,lty=k-1)
If we add the constant term, the first term and the second term, we get the part on the left, before the first knot,
k=3 lines(xr,B[,1:k]%*%coefficients(reg)[1:k],col=k-1,lty=k-1)
and with three terms from the spline based matrix, we can get the part between the two knots,
k=4 lines(xr,B[,1:k]%*%coefficients(reg)[1:k],col=k-1,lty=k-1)
and finallty, when we sum all the terms, we get this time the part on the right, after the last knot,
k=5 lines(xr,B[,1:k]%*%coefficients(reg)[1:k],col=k-1,lty=k-1)
This is what we get using a spline regression, quadratic, with two (fixed) knots. And can can even get confidence intervals, as before
reg=lm(yr~B) P=predict(reg,interval="confidence") plot(db,col="white") polygon(c(xr,rev(xr)),c(P[,2],rev(P[,3])),col="light blue",border=NA) points(db) reg=lm(yr~B) lines(xr,P[,1],col="red") abline(v=c(0,2,5,10),lty=2)
The great idea here is to use functions , that will insure continuity at point .
Of course, we can use those splines on our Dexter application,
Here again, using linear spline function, it is possible to impose a continuity constraint,
plot(data$no,data$mu,ylim=c(6,10)) abline(v=12*(0:8)+.5,lty=2) reg=lm(mu~bs(no,knots=c(12*(1:7)+.5),Boundary.knots=c(0,97), degre=1),data=db) lines(c(1:94,96),predict(reg),col="red")
But we can also consider some quadratic splines,
plot(data$no,data$mu,ylim=c(6,10)) abline(v=12*(0:8)+.5,lty=2) reg=lm(mu~bs(no,knots=c(12*(1:7)+.5),Boundary.knots=c(0,97), degre=2),data=db) lines(c(1:94,96),predict(reg),col="red")
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.