[This article was first published on mickeymousemodels, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Earlier today I read a post about truncated normals, and one plot in particular jumped out at me:Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
By definition, the truncated normal should have zero density everywhere to the left of the truncation point, but that’s not what we see in the plot. What’s going on? The catch is that this isn’t a plot of the true density, but rather a plot of the kernel density estimate returned by R’s density function. The KDE is essentially a mixture of a bunch of normal densities, one for each data point. Wikipedia provides a nice illustration: in the rightmost frame, the red densities are (uniformly) mixed to create the blue KDE.
What happens when this estimate is applied to draws from a truncated normal? Every point is replaced with its own little normal density, and the points / densities at the very edge of the truncation cutoff inevitably spill over into the zero-density region. The KDE returns a smooth function, whereas truncated normals actually have discontinuous densities. Here’s some R code to illustrate that point:
dnorm.truncated <- function(x, mean=0, sd=1, left=-1.5, right=0.50) { # Probability that left <= x <= right probability <- pnorm(right, mean=mean, sd=sd) - pnorm(left, mean=mean, sd=sd) return((x >= left & x <= right) * dnorm(x, mean=mean, sd=sd) / probability) } # Sanity check: does the density integrate to one? integrate(dnorm.truncated, -Inf, Inf) integrate(dnorm.truncated, -1.5, 0.50) dev.new(height=6, width=8) x.coords <- seq(-3, 3, by=0.005) plot(x.coords, dnorm(x.coords), ylim=c(0, 0.65), col="red", type="l", lwd=2, xlab="x", ylab="density", main="Two Density Functions") lines(x.coords, dnorm.truncated(x.coords), col="darkslateblue", lwd=2) legend("topleft", "Truncated Normal", bty="n", col="darkslateblue", lty=1, lwd=1.5) legend("topright", "Standard Normal", bty="n", col="red", lty=1, lwd=1.5) savePlot("standard_and_truncated_normal_densities.png") # Compare kernel density estimate to true truncated density num.obs <- 1000 x <- rnorm(num.obs) x <- x[x >= -1.5 & x <= 0.50] plot(x.coords, dnorm.truncated(x.coords), ylim=c(0, 0.70), col="darkslateblue", type="l", lwd=2, xlab="x", ylab="density", main="Two Density Functions") lines(density(x), col="darkgray", lty=3, lwd=2) mtext(sprintf("Kernel density estimated from %s observations", num.obs)) legend("topleft", "Truncated Normal", bty="n", col="darkslateblue", lty=1, lwd=1.5) legend("topright", "Kernel Density Estimate", bty="n", col="darkgray", lty=3, lwd=1.5) savePlot("truncated_normal_and_kernel_density_estimate.png")
The first plot shows two densities: a standard normal, and a truncated standard normal where we drop any observation that falls outside of [-1.50, 0.50].
The second plot shows the same truncated normal, this time accompanied by its kernel density estimate. Note that the KDE spills over both the left and right truncation points.
That’s all for now. If you’d like to read a little more about kernel density estimates, check out this post by Andrew Gelman.
To leave a comment for the author, please follow the link and comment on their blog: mickeymousemodels.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.