Calculating an LOOCV MSE by hand

arthur charpentier

18 hours ago

[This article was first published on R-english – Freakonometrics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Last week, we had an “mid-term” exam, for our introduction to statistical learning course. The question is simple: consider three points, \((x_i,y_i)\), here \(\{(0,2),(2,2),(3,1)\}\)Consider here some linear models, estimated using least square techniques, what would be the leave-one-out cross-validation MSE ?

I like this exercise since we can compute everything easily, by hand. Since at each step we remove one single observation, only two observations remain in the sample. In with two points, fiting a linear model is straightforward (whatever the technique considered). Here, we’re simply considering the straight line that passes through the other two points. And since we have the straight line (without the minimal calculation of minimizing the sum of squared errors), we have the error committed on the omitted observation. This is exactly what we see in the drawing below

In other words, the LOOCV MSE is here\({\displaystyle\operatorname{MSE}={\frac{1}{n}}\sum_{i=1}^{n}\left(Y_{i}-{\hat {Y_{i}}^{(-i)}}\right)^{2}}\), where, intuitively, \(\hat {Y_{i}}^{(-i)}\) denotes the prediction associated with \(x_i\) with the model obtained on the other \(n-1\) observations. Thus, here\({\displaystyle\operatorname{MSE}=\frac{1}{3}\big(2^2+\frac{2^2}{3^2}+1^2\big)=\frac{1}{27}\big(36+4+9\big)=\frac{49}{27}}\)Note that we can also use R to compute that quantity,

> x = c(0,2,3)
> y = c(2,2,1)
> df = data.frame(x=x,y=y)
> yp = rep(NA,3)
> for(i in 1:3){
+ reg = lm(y~x, data=df[-i,])
+ yp[i] = predict(reg,newdata=df)[i]
+ }
> 1/3*sum((yp-y)^2)
[1] 1.814815

which is precisely what we obtained, by hand.

To leave a comment for the author, please follow the link and comment on their blog: R-english – Freakonometrics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Related