Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Let us continue our discussion on smoothing techniques in regression. Assume that .
which can also be writen as
for some
The second part, is some integral. Using Riemann integral, observe that
for some
Thus,
Nice! We have our linear regression model. A natural idea is then to consider a regression of
given some knots
plot(db)
If we consider one knot, and an expansion of order 1,
attach(db) library(splines) B=bs(xr,knots=c(3),Boundary.knots=c(0,10),degre=1) reg=lm(yr~B) lines(xr[xr<=3],predict(reg)[xr<=3],col="red") lines(xr[xr>=3],predict(reg)[xr>=3],col="blue")
The prediction obtained with this spline can be compared with regressions on subsets (the doted lines)
reg=lm(yr~xr,subset=xr<=3) lines(xr[xr<=3],predict(reg)[xr<=3],col="red",lty=2) reg=lm(yr~xr,subset=xr>=3) lines(xr[xr>=3],predict(reg),col="blue",lty=2)
It is different, since we have here three parameters (and not four, as for the regressions on the two subsets). One degree of freedom is lost, when asking for a continuous model. Observe that it is possible to write, equivalently
reg=lm(yr~bs(xr,knots=c(3),Boundary.knots=c(0,10),degre=1),data=db)
So, what happened here?
B=bs(xr,knots=c(2,5),Boundary.knots=c(0,10),degre=1) matplot(xr,B,type="l") abline(v=c(0,2,5,10),lty=2)
Here, the functions that appear in the regression are the following
Now, if we run the regression on those two components, we get
B=bs(xr,knots=c(2,5),Boundary.knots=c(0,10),degre=1) matplot(xr,B,type="l") abline(v=c(0,2,5,10),lty=2)
If we add one knot, we get
the prediction is
reg=lm(yr~B) lines(xr,predict(reg),col="red")
Of course, we can choose much more knots,
B=bs(xr,knots=1:9,Boundary.knots=c(0,10),degre=1) reg=lm(yr~B) lines(xr,predict(reg),col="red")
We can even get a confidence interval
reg=lm(yr~B) P=predict(reg,interval="confidence") plot(db,col="white") polygon(c(xr,rev(xr)),c(P[,2],rev(P[,3])),col="light blue",border=NA) points(db) reg=lm(yr~B) lines(xr,P[,1],col="red") abline(v=c(0,2,5,10),lty=2)
And if we keep the two knots we chose previously, but consider Taylor’s expansion of order 2, we get
B=bs(xr,knots=c(2,5),Boundary.knots=c(0,10),degre=2) matplot(xr,B,type="l") abline(v=c(0,2,5,10),lty=2)
So, what’s going on? If we consider the constant, and the first component of the spline based matrix, we get
k=2 plot(db) B=cbind(1,B) lines(xr,B[,1:k]%*%coefficients(reg)[1:k],col=k-1,lty=k-1)
If we add the constant term, the first term and the second term, we get the part on the left, before the first knot,
k=3 lines(xr,B[,1:k]%*%coefficients(reg)[1:k],col=k-1,lty=k-1)
and with three terms from the spline based matrix, we can get the part between the two knots,
k=4 lines(xr,B[,1:k]%*%coefficients(reg)[1:k],col=k-1,lty=k-1)
and finallty, when we sum all the terms, we get this time the part on the right, after the last knot,
k=5 lines(xr,B[,1:k]%*%coefficients(reg)[1:k],col=k-1,lty=k-1)
This is what we get using a spline regression, quadratic, with two (fixed) knots. And can can even get confidence intervals, as before
reg=lm(yr~B) P=predict(reg,interval="confidence") plot(db,col="white") polygon(c(xr,rev(xr)),c(P[,2],rev(P[,3])),col="light blue",border=NA) points(db) reg=lm(yr~B) lines(xr,P[,1],col="red") abline(v=c(0,2,5,10),lty=2)
The great idea here is to use functions
Of course, we can use those splines on our Dexter application,
Here again, using linear spline function, it is possible to impose a continuity constraint,
plot(data$no,data$mu,ylim=c(6,10)) abline(v=12*(0:8)+.5,lty=2) reg=lm(mu~bs(no,knots=c(12*(1:7)+.5),Boundary.knots=c(0,97), degre=1),data=db) lines(c(1:94,96),predict(reg),col="red")
But we can also consider some quadratic splines,
plot(data$no,data$mu,ylim=c(6,10)) abline(v=12*(0:8)+.5,lty=2) reg=lm(mu~bs(no,knots=c(12*(1:7)+.5),Boundary.knots=c(0,97), degre=2),data=db) lines(c(1:94,96),predict(reg),col="red")
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.