Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
Kernel Density Plots are a type of plot that displays the distribution of values in a dataset using one continuous curve. They are similar to histograms, but they are even better at displaying the shape of a distribution since they aren’t affected by the number of bins used in the histogram. In this blog post, we will discuss what Kernel Density Plots are in simple terms, what they are useful for, and show several examples using both base R and ggplot2.
< section id="what-are-kernel-density-plots" class="level1">What are Kernel Density Plots?
Kernel Density Plots are a way to estimate the probability density function of a continuous random variable. They are sometimes referred to as a kernel density plot or kernel density estimation plot. The probability density function (PDF) is estimated using the observed data points in the theory underlying a density plot. Each data point is the center of a kernel function, usually a Gaussian (normal) kernel. The density estimate’s shape and width are determined by the kernel function.
< section id="what-are-kernel-density-plots-useful-for" class="level1">What are Kernel Density Plots useful for?
Kernel Density Plots are useful for visualizing the distribution of a dataset. They can be used to identify the shape of the distribution, including whether it is symmetric or skewed, and whether it has one or more peaks. They can also be used to compare the distributions of two or more datasets.
< section id="examples" class="level1">Examples
< section id="examples-using-base-r" class="level2">Examples using base R
To create a Kernel Density Plot in base R, we can use the density()
function to estimate the density and the plot()
function to plot it. Here’s an example:
# Generate data set.seed(1234) x <- rnorm(500) # Estimate density dens <- density(x) # Plot density plot(dens)
This will generate a Kernel Density Plot of the x
dataset.
We can also overlay the density curve over a histogram using the lines()
function. Here’s an example:
# Generate data set.seed(1234) x <- rnorm(500) # Plot histogram hist(x, freq = FALSE) # Estimate density dens <- density(x) # Overlay density curve lines(dens, col = "red")
This will generate a histogram with a Kernel Density Plot overlaid on top.
< section id="examples-using-ggplot2" class="level2">Examples using ggplot2
To create a Kernel Density Plot in ggplot2, we can use the geom_density()
function. Here’s an example:
# Load ggplot2 package library(ggplot2) # Generate data set.seed(1234) x <- rnorm(500) # Create data frame df <- data.frame(x = x) # Plot density ggplot(df, aes(x = x)) + geom_density() + theme_minimal()
This will generate a Kernel Density Plot of the x
dataset using ggplot2.
We can also customize the plot by changing the color and fill of the density curve. Here’s an example:
# Generate data set.seed(1234) x <- rnorm(500) # Create data frame df <- data.frame(x = x) # Plot density ggplot(df, aes(x = x)) + geom_density(color = "red", fill = "blue", alpha = 0.328) + theme_minimal()
This will generate a Kernel Density Plot of the x
dataset using ggplot2 with a red line, blue fill, and 33% transparency.
Example using TidyDensity
I have posted on it before but TidyDensity can also help in creating density plots for data that use the tidy_
distribution functions with its own autoplot function. Let’s take a look at an example using the same data as above.
library(TidyDensity) set.seed(1234) tn <- tidy_normal(.n = 500) tn |> tidy_autoplot()
Now let’s see it with different means on the same chart.
set.seed(1234) tidy_multi_single_dist( .tidy_dist = "tidy_normal", .param_list = list( .n = 500, .mean = c(-2, 0, 2), .sd = 1, .num_sims = 1 ) ) |> tidy_multi_dist_autoplot()
And one final one with multiple simulations of each distribution.
set.seed(1234) tidy_multi_single_dist( .tidy_dist = "tidy_normal", .param_list = list( .n = 500, .mean = c(-2,0,2), .sd = 1, .num_sims = 5 ) ) |> tidy_multi_dist_autoplot()
Conclusion
Kernel Density Plots are a useful tool for visualizing the distribution of a dataset. They are easy to create in both base R and ggplot2, and can be customized to fit your needs. We encourage readers to try creating their own Kernel Density Plots using the examples provided in this blog post.
< section id="references" class="level1">References
- [1] https://r-coder.com/density-plot-r/
- [2] https://www.geeksforgeeks.org/how-to-make-density-plots-with-ggplot2-in-r/
- [3] https://youtube.com/watch?v=oN6D8_ztl04
- [4] http://www.sthda.com/english/wiki/ggplot2-density-plot-quick-start-guide-r-software-and-data-visualization
- [5] https://www.tutorialspoint.com/how-to-manually-set-the-colors-of-density-plot-for-categories-in-r
- [6] https://www.statology.org/kernel-density-plot-r/
- [7] https://www.spsanderson.com/TidyDensity/reference/tidy_multi_single_dist.html
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.