Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
It seems to me that the kde1d package (One-Dimensional
Kernel Density Estimation) is not very known. I’ve never heard of it on
Stack Overflow, except in an answer of mine.
However this is a great package, IMHO. I’m going to show why I like it.
The d/p/q/r family
Estimating a density with the kde1d function returns a
kde1d object, and this makes available the density, the
distribution function, the quantile function associated to the density
estimate, as well as a sampler from the estimated distribution.
Let’s fit a density with kde1d to a simulated Gaussian
sample:
library(kde1d) set.seed(666) y <- rnorm(100) fit <- kde1d(y)
Here is the density estimate, in green, along with the true density, in blue:
opar <- par(mar = c(3, 1, 1, 1)) plot(NULL, xlim = c(-3.5, 3.5), ylim = c(0, 0.4), axes = FALSE, xlab = NA) axis(1, at = seq(-3, 3, by=1)) curve(dkde1d(x, fit), n = 300, add = TRUE, col = "green", lwd = 2) curve(dnorm(x), n = 300, add = TRUE, col = "blue", lwd = 2)
The density can even be evaluated outside the range of the data:
print(dkde1d(max(y)+1, fit)) ## [1] 0.001684873
The corresponding cumulative distribution function:
opar <- par(mar = c(4.5, 5, 1, 1))
plot(NULL, xlim = c(-3.5, 3.5), ylim = c(0, 1), axes = FALSE,
xlab = "x", ylab = expression("Pr("<="x)"))
axis(1, at = seq(-3, 3, by=1))
axis(2, at = seq(0, 1, by=0.25))
curve(pkde1d(x, fit), n = 300, add = TRUE, col = "green", lwd = 2)
curve(pnorm(x), n = 300, add = TRUE, col = "blue", lwd = 2)
The corresponding inverse cumulative distribution function is evaluated
by qkde1d, and rkde1d simulates from the
estimated distribution.
Bounded data
By default, the data supplied to the kde1d function is
assumed to be unbounded. For bounded data, use the
xmin and/or xmax options.
Estimating monotonic densities
Now, something I use to convince my folks that kde1d is
great. Consider a distribution having a monotonic density. The base
function density does not correctly estimate the density
(at least, with the default settings):
set.seed(666) y <- rbeta(100, 1, 4) opar <- par(mar = c(3, 1, 1, 1)) plot(NULL, xlim = c(0, 1), ylim = c(0, 4), axes = FALSE, xlab = NA) axis(1, at = seq(0, 1, by=0.2)) lines(density(y, from = 0, to = 1), col = "green", lwd = 2) curve(dbeta(x, 1, 4), n = 300, add = TRUE, col = "blue", lwd = 2)
The monotonic aspect of the density does not occur in the estimated
density. With kde1d, it does:
fit <- kde1d(y, xmin = 0, xmax = 1) opar <- par(mar = c(3, 1, 1, 1)) plot(NULL, xlim = c(0, 1), ylim = c(0, 4), axes = FALSE, xlab = NA) axis(1, at = seq(0, 1, by=0.2)) curve(dkde1d(x, fit), n = 300, add = TRUE, col = "green", lwd = 2) curve(dbeta(x, 1, 4), n = 300, add = TRUE, col = "blue", lwd = 2)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
