Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
TLDR: Pass the output of the isoreg
function to as.stepfun
to make an isotonic regression model into a black box object that takes in uncalibrated predictions and outputs calibrated ones.
Isotonic regression is a method for obtaining a monotonic fit for 1-dimensional data. Let’s say we have data
Isotonic regression can be performed easily in R with the stats
package’s isoreg
function. Note the slightly unusual syntax when pulling out the fitted values (see the function’s documentation with ?isoreg
to understand why this is the case). The plot shows the original data values as black crosses and the fitted values as blue dots. As expected, the blue dots are monotonically increasing.
# training data set.seed(1) x <- sample(2 * 1:15) y <- 0.2 * x + rnorm(length(x)) # isotonic reg fit and plot fit <- isoreg(x, y) plot(x, y, pch = 4) points(fit$x[fit$ord], fit$yf, pch = 16, col = "blue")
Isotonic regression is one commonly used method for calibration: see this previous post for background on calibration and this link for more details with python code. In this setting, we want the isotonic regression model to be a black box: we hand it uncalibrated predictions as an input, and it returns us calibrated predictions.
If you inspect the return value of the isoreg
function, you will find that it is unable to interact with any new test data. Imagine that we have some new test data that we want to calibrate:
set.seed(2) test_x <- sample(2 * 1:15 - 1) test_y <- 0.2 * test_x + rnorm(length(test_x))
A naive and WRONG way to calibrate test_y
would be to run isotonic regression on just the test data. The plot shows the fits for the training data as blue dots and the fits for the testing data as red squares: the overall fit is not monotonic.
# WRONG isotonic reg fit and plot fit2 <- isoreg(test_x, test_y) plot(x, y, pch = 4) points(test_x, test_y, pch = 4) points(fit$x[fit$ord], fit$yf, pch = 16, col = "blue") points(fit2$x[fit2$ord], fit2$yf, pch = 15, col = "red")
A second WRONG way to calibrate test_y</
code> would be to run isotonic regression on the combined training/test data. The plot shows that while the overall fit is monotonic, the predictions for the training data have shifted, i.e. you have changed the black box.
# WRONG isotonic reg fit and plot (v2) all_x <- c(x, test_x) all_y <- c(y, test_y) fit3 <- isoreg(all_x, all_y) plot(all_x, all_y, pch = 4) points(fit$x[fit$ord], fit$yf, pch = 16, col = "blue") points(fit3$x[fit3$ord], fit3$yf, pch = 15, col = "red")
The CORRECT way to make the isotonic regression model into a black box is to pass the output of isoreg
to the as.stepfun
function, like so:
isofit <- as.stepfun(isoreg(x, y))
isofit
is the black box we seek: a function that we give uncalibrated predictions to get calibrated predictions in return. As the plot below shows, the overall fit is still monotonic, and the calibrated predictions for the training data do not change.
plot(all_x, all_y, pch = 4) points(all_x, isofit(all_x), pch = 15, col = "red") points(fit$x[fit$ord], fit$yf, pch = 16, col = "blue")
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.