[This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In the model development, the “leave-one-out” prediction is a way of cross-validation, calculated as below:
1. First of all, after a model is developed, each observation used in the model development is removed in turn and then the model is refitted with the remaining observations
2. The out-of-sample prediction for the refitted model is calculated with the removed observation one by one to assemble the LOO, e.g. leave-one-out predicted values for the whole model development sample.
The loo_predict() function below is a general routine to calculate the LOO prediction for any GLM object, which can be further employed to investigate the model stability and predictability.
> pkgs <- c('doParallel', 'foreach') > lapply(pkgs, require, character.only = T) [[1]] [1] TRUE [[2]] [1] TRUE > registerDoParallel(cores = 8) > > data(AutoCollision, package = "insuranceData") > # A GAMMA GLM # > model1 <- glm(Severity ~ Age + Vehicle_Use, data = AutoCollision, family = Gamma(link = "log")) > # A POISSON GLM # > model2 <- glm(Claim_Count ~ Age + Vehicle_Use, data = AutoCollision, family = poisson(link = "log")) > > loo_predict <- function(obj) { + yhat <- foreach(i = 1:nrow(obj$data), .combine = rbind) %dopar% { + predict(update(obj, data = obj$data[-i, ]), obj$data[i,], type = "response") + } + return(data.frame(result = yhat[, 1], row.names = NULL)) + } > # TEST CASE 1 > test1 <- loo_predict(model1) > test1$result [1] 303.7393 328.7292 422.6610 375.5023 240.9785 227.6365 288.4404 446.5589 [9] 213.9368 244.7808 278.7786 443.2256 213.9262 243.2495 266.9166 409.2565 [17] 175.0334 172.0683 197.2911 326.5685 187.2529 215.9931 249.9765 349.3873 [25] 190.1174 218.6321 243.7073 359.9631 192.3655 215.5986 233.1570 348.2781 > # TEST CASE 2 > test2 <- loo_predict(model2) > test2$result [1] 11.15897 37.67273 28.76127 11.54825 50.26364 152.35489 122.23782 [8] 44.57048 129.58158 465.84173 260.48114 107.23832 167.40672 510.41127 [15] 316.50765 121.75804 172.56928 546.25390 341.03826 134.04303 359.30141 [22] 977.29107 641.69934 251.32547 248.79229 684.86851 574.13994 238.42209 [29] 148.77733 504.12221 422.75047 167.61203
To leave a comment for the author, please follow the link and comment on their blog: Yet Another Blog in Statistical Computing » S+/R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.