Estimating continuous piecewise linear regression
[This article was first published on R snippets, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
When talking about smoothing splines a simple point to start with is a continuous piecewise linear regression with fixed knots. I did not find any simple example showing how to estimate the it in GNU R so I have created a little snippet that does the job.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Assume you are given continuous predictor x and continuous predicted variable y. We want to estimate continuous piecewise linear regression with fixed knots stored in variable knots using standard lm procedure.
The key to a solution is proper definition of regression formula. In order to introduce possibility of change of slope in knot k we have to add a so called hinge term to the model max(0, x-k).
In the code given below function piece.formula automatically generates a proper right hand side of the regression formula given variable name and list of required knots. It is next tested on a simple function.
N <- 40 # number of sampled points K <- 5 # number of knots piece.formula <- function(var.name, knots) { formula.sign <- rep(" - ", length(knots)) formula.sign[knots < 0] <- " + " paste(var.name, "+", paste("I(pmax(", var.name, formula.sign, abs(knots), ", 0))", collapse = " + ", sep="")) } f <- function(x) { 2 * sin(6 * x) } set.seed(1) x <- seq(-1, 1, len = N) y <- f(x) + rnorm(length(x)) knots <- seq(min(x), max(x), len = K + 2)[-c(1, K + 2)] model <- lm(formula(paste("y ~", piece.formula("x", knots)))) par(mar = c(4, 4, 1, 1)) plot(x, y) lines(x, f(x)) new.x <- seq(min(x), max(x) ,len = 10000) points(new.x, predict(model, newdata = data.frame(x = new.x)), col = "red", pch = ".") points(knots, predict(model, newdata = data.frame(x = knots)), col = "red", pch = 18)Below we can see the graph of estimation result. Red line is the desired continuous piecewise linear regression with fixed knots given by red diamonds. Notice that the plot uses points procedure to plot the red line to highlight that the generated predictions have the required properties.
An additional value of the presented solution that we do not do any preprocessing of predictor variable if we want to make a prediction - all the calculations are made within the formula.
Of course this simple example can be easily extended to obtain a simple smoother. For example we can set K to be large and use some regularized regression like ridge or lasso.
To leave a comment for the author, please follow the link and comment on their blog: R snippets.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.