Fast computation of cross-validation in linear models
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The leave-one-out cross-validation statistic is given by
![Rendered by QuickLaTeX.com \[\text{CV} = \frac{1}{N} \sum_{i=1}^N e_{[i]}^2,\]](https://i0.wp.com/robjhyndman.com/hyndsight/wp-content/ql-cache/quicklatex.com-6f6f2337ced8098902d589f456362f32_l3.png?resize=127%2C53)
where
,
are the observations, and
is the predicted value obtained when the model is estimated with the
th case deleted. This is also sometimes known as the PRESS (Prediction Residual Sum of Squares) statistic.
It turns out that for linear models, we do not actually have to estimate the model
times, once for each omitted case. Instead, CV can be computed after estimating the model once on the complete data set.
Suppose we have a linear regression model
. Then
and
is the “hat-matrix”. It has this name because it is used to compute
. If the diagonal values of
are denoted by
, then the leave-one-out cross-validation statistic can be computed using
![Rendered by QuickLaTeX.com \[ \text{CV} = \frac{1}{N}\sum_{i=1}^N [e_{i}/(1-h_{i})]^2,\]](https://i1.wp.com/robjhyndman.com/hyndsight/wp-content/ql-cache/quicklatex.com-43c0b47ec49fcb355d04afa156fd7541_l3.png?resize=203%2C53)
where
and
is the predicted value obtained when the model is estimated with all data included. This is a remarkable result, and is given without proof in Section 5.5 of my forecasting textbook.
I am teaching my second year forecasting class about this result tomorrow, and I thought my students might like to see the proof. What follows is the simplest proof I know (adapted from Seber and Lee, 2003).
Proof
Let
and
be similar to
and
but with the
th row deleted in each case. Let
be the
th row of
and let
![]()
be the estimate of
without the
th case. Then
.
Now
and
. So by the Sherman–Morrison–Woodbury formula,
![]()
Also note that
. Therefore
![Rendered by QuickLaTeX.com \begin{align*} \bm{\hat{\beta}}_{[i]} %&= (\bm{X}_{[i]}'\bm{X}_{[i]})^{-1} (\bm{X}'\bm{Y} - \bm{x}_i y_i)\\ &= \left[ (\bm{X}'\bm{X})^{-1} + \frac{ (\bm{X}'\bm{X})^{-1}\bm{x}_i\bm{x}_i'(\bm{X}'\bm{X})^{-1} }{1-h_i} \right] (\bm{X}'\bm{Y} - \bm{x}_i y_i)\\ &= \hat{\bm{\beta}} - \left[ \frac{ (\bm{X}'\bm{X})^{-1}\bm{x}_i}{1-h_i}\right] \left[y_i(1-h_i) - \bm{x}_i' \hat{\bm{\beta}} +h_i y_i \right]\\ &= \hat{\bm{\beta}} - (\bm{X}'\bm{X})^{-1}\bm{x}_i e_i / (1-h_i) \end{align*}](https://i1.wp.com/robjhyndman.com/hyndsight/wp-content/ql-cache/quicklatex.com-15598c543e2f6c21beec17a3243d144a_l3.png?resize=450%2C123)
Thus
![Rendered by QuickLaTeX.com \begin{align*} e_{[i]} &= y_i - \bm{x}_i'\hat{\bm{\beta}}_{[i]} \\ & = y_i - \bm{x}_i' \left[ \hat{\bm{\beta}} - (\bm{X}'\bm{X})^{-1}\bm{x}_ie_i/(1-h_i)\right] \\ &= e_i + h_i e_i/(1-h_i) \\ &= e_i/(1-h_i), \end{align*}](https://i1.wp.com/robjhyndman.com/hyndsight/wp-content/ql-cache/quicklatex.com-00184470b7a8d66340b150653a88c5e1_l3.png?resize=321%2C117)
and the result follows.
This result is implemented in the CV() function from the forecast package for R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.