Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Theory
< section id="aic" class="level3">AIC
Consider the AIC for the usual linear model
where
where:
and the second equality results from the Stirling approximation
where, according to standard assumptions,
Now consider two such models, with different covariate vectors
for
Assuming, without loss of generality, that
To gain some intuition, suppose that the set of variables contained in
Cross-entropy
In parallel to AIC, we can consider the exact “information criterion” provided by the model in-sample cross-entropy under the true data generating process. For a single linear model, the in-sample cross-entropy is:
(“in-sample” refers to the fact that we fix, i.e. condition, on the covariate vector of the training sample,
- The numerator and denominator are conditionally independent
variables with and degrees of freedom respectively. This can be seen by rewriting these as , and , respectively, where as usual. - For a
random variable with degrees of freedom we have .
Using these results, we can show that:
(an equation which is true by design of AIC).
Before rushing to the (wrong) conclusion that
A bit more pragmatically, in the real world we could assume the remainder of Equation 10 to be
Simulation
< section id="setup" class="level3">Setup
We take the data generating process to be:
with:
m <- 0.1 q <- 0 rxy <- function(n) { tibble( x = rnorm(n, sd = 1), y = m * x + q + rnorm(n, sd = 1) ) }
We compare the model with vs. without slope term (lm
objects. We also define a “Naive Information Criterion”
nic <- function(fit) { p <- length(coef(fit)) n <- nobs(fit) sigma_hat <- sigma(fit) * sqrt((n - p) / n) log(sigma_hat) } aic <- function(fit) { p <- length(coef(fit)) n <- nobs(fit) sigma_hat <- sigma(fit) * sqrt((n - p) / n) log(sigma_hat) + (p + 1) / n + 0.5 *(1 + log(2*pi)) } ce <- function(fit, data) { p <- length(coef(fit)) n <- nobs(fit) sigma_hat <- sigma(fit) * sqrt((n - p) / n) y_hat <- fitted(fit) mu <- data$x * m + q res <- 0 res <- res + 0.5 / (sigma_hat^2) res <- res + log(sigma_hat) res <- res + mean(0.5 * (y_hat - mu)^2 / (sigma_hat^2)) res <- res + 0.5 * log(2 * pi) return(res) }
From our results above, we expect:
The expected in-sample cross-entropies cannot be computed explicitly, but for relatively small
I will use tidyverse for plotting results.
library(dplyr) library(ggplot2)
In order to make results reproducible let’s:
set.seed(840)
Results
We simulate fitting models
fits <- tidyr::expand_grid( n = 10 ^ seq(from = 1, to = 3, by = 0.5), b = 1:1e3 ) |> mutate(data = lapply(n, rxy)) |> group_by(n, b, data) |> tidyr::expand(model = c(y ~ 1, y ~ x)) |> ungroup() |> mutate( fit = lapply(row_number(), \(i) lm(model[[i]], data = data[[i]])), ce = sapply(row_number(), \(i) ce(fit[[i]], data[[i]])), aic = sapply(fit, aic), nic = sapply(fit, nic), model = format(model) ) |> select(-c(fit, data))
The plots below show the dependence from sample size of
fits |> mutate( is_baseline = model == "y ~ 1", delta_ce = ce - ce[is_baseline], delta_aic = aic - aic[is_baseline], delta_nic = nic - nic[is_baseline], .by = c(n, b), ) |> filter(!is_baseline) |> summarise( `E( ΔCE )` = mean(delta_ce), `E( ΔAIC )` = mean(delta_aic), `E( ΔNIC )` = mean(delta_nic), .by = n ) |> tidyr::pivot_longer( -n, names_to = "metric", values_to = "value" ) |> ggplot(aes(x = n, y = value, color = metric)) + geom_point() + geom_line() + geom_hline(yintercept = 0, linetype = "dashed") + geom_vline(aes(xintercept = 1 / m^2), linetype = "dotted") + scale_x_log10("Sample Size") + coord_cartesian(ylim = c(-0.025, 0.025)) + ylab(expression(IC)) + theme(legend.position = "bottom", legend.title = element_blank()) + ggtitle("AIC vs. in-sample cross-entropy", "Expected values") + NULL
fits |> filter(aic == min(aic), .by = c(n, b)) |> summarise(count = n(), .by = c(n, model)) |> ggplot(aes(fill = model, x = n, y = count)) + geom_col() + scale_x_log10("Sample Size") + ylab("Count") + theme(legend.position = "bottom") + ggtitle("AIC model selection frequencies")
fits |> filter(n %in% c(10, 100, 1000)) |> mutate(delta_aic = aic - aic[model == "y ~ 1"], .by = c(n, b)) |> filter(model != "y ~ 1") |> mutate(expec = -0.5 * log(1 + m^2) + 0.5 / n) |> ggplot(aes(x = delta_aic, color = as.factor(n))) + geom_density() + coord_cartesian(xlim = c(-0.1, NA)) + labs(x = "ΔAIC", y = "Density", color = "Sample Size") + ggtitle("ΔAIC probability density")
Finally, here is something I have no idea where it comes from. The plot below shows the scatterplot of in-sample cross-entropy differences vs. the AIC differences. It is well known that AIC only estimates the expectation of these differences, averaged over potential training samples. One may ask whether AIC has anything to say about the actual cross-entropy difference for the estimated models, conditional on the realized training sample.
Assuming I have made no errors here, the tilted-U shape of this scatterplot is a clear negative answer. What’s especially interesting is that, apparently, these differences have a negative correlation. I fail to see where do the negative correlation and the U-shape come from.
fits |> filter(n == 100) |> mutate( is_baseline = model == "y ~ 1", delta_ce = ce - ce[is_baseline], delta_aic = aic - aic[is_baseline], .by = c(n, b), ) |> filter(!is_baseline) |> ggplot(aes(x = delta_aic, y = delta_ce)) + geom_point(size = 1, alpha = 0.2) + lims(x = c(-0.02, 0.01), y = c(-0.01, 0.03)) + labs(x = "ΔAIC", y = "ΔCE") + ggtitle("AIC vs. in-sample cross-entropy", "Point values for N = 100") + NULL
Footnotes
See e.g. 1503.06266↩︎
The same equation actually gives the expectation of
conditional to the in-sample covariate vector . Since this conditioning differs for the two different models involving and , in our comparison of expected values we must interpret this as unconditional expectations, in general.↩︎
Reuse
< section class="quarto-appendix-contents" id="quarto-citation">Citation
@online{gherardi2024, author = {Gherardi, Valerio}, title = {AIC in the Well-Specified Linear Model: Theory and Simulation}, date = {2024-05-17}, url = {https://vgherard.github.io/posts/2024-05-09-aic-in-the-well-specified-linear-model-theory-and-simulation/}, langid = {en} }
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.