Site icon R-bloggers

ggdist: Make a Raincloud Plot to Visualize Distribution in ggplot2

[This article was first published on business-science.io, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The ggdist package is a ggplot2 extension that is made for visualizing distributions and uncertainty. We’ll show see how ggdist can be used to make a raincloud plot.

R-Tips Weekly

This article is part of R-Tips Weekly, a weekly video tutorial that shows you step-by-step how to do common R coding tasks.

Here are the links to get set up. ?

Video Tutorial
For those that prefer Full YouTube Video Tutorials.

Learn how to use ggdist in our 7-minute YouTube video tutorial.

(Click image to play tutorial)

What is a Raincloud Plot?

The Raincloud Plot is a visualization that produces a half-density to a distribution plot. It gets the name because the density plot is in the shape of a “raincloud”. The raincloud (half-density) plot enhances the traditional box-plot by highlighting multiple modalities (an indicator that groups may exist). The boxplot does not show where densities are clustered, but the raincloud plot does!

Raincloud Plot (We’ll make in this tutorial)

We’ll go through a short tutorial to get you up and running with ggdist to make a raincloud plot.

Raincloud Plots with ggdist [Tutorial]

This tutorial showcases the awesome power of ggdist for visualizing distributions.

Tutorial Credits

This tutorial wouldn’t be possible without another tutorial, Visualizing Distributions with Raincloud Plots by Cédric Scherer. Cédric truly a ggplot2 master. Follow Cédric Scherer on Twitter to learn more about his excellent visualization work.

Before we get started, get the R Cheat Sheet

ggdist is great for extending ggplot2 with distributions. But, you’ll need to learn ggplot2 to take full advantage. For these topics, I’ll use the Ultimate R Cheat Sheet to refer to ggplot2 code in my workflow.

Quick Example:

Download the Ultimate R Cheat Sheet. Then Click the “CS” hyperlink to “ggplot2”.


Now you’re ready to quickly reference the ggplot2 cheat sheet. This shows you the core plotting functions available in the ggplot library.

Onto the tutorial.

Load the Libraries and Data

First, run this code to:

  1. Load Libraries: Load ggdist, tidyquant, and tidyverse.
  2. Import Data: We’re using the mpg dataset that comes with ggplot2.

Get the code.

Raincloud Plot: Using ggplot

Next, we’ll make a Raincloud plot that highlights the distribution of Vehicle Fuel Economy (MPG) by Engine Size (Number of Cylinders). It helps if you have ggplot2 visualization experience. If you are interested in learning ggplot2 in-depth, check out our R for Business Analysis Course (DS4B 101-R) that contains over 30-hours of video lessons on learning R for data analysis.

Make the ggplot2 canvas

The first step is to make the ggplot2 canvas. We:

  1. Prep the Data: Using filter() to isolate the most common (frequent) vehicle engine sizes

  2. Map the columns: Using ggplot(), we map the cyl and hwy column. We also make a transformation to convert a numeric cyl column to a discrete cyl column with factor().

Get the code.

This produces a blank plot, which is the first layer. You can see that the x-axis is labeled “factor(cyl)” and the y-axis is “hwy” indicating the data has been mapped to the visualization.

Add the Rainclouds with stat_halfeye())

Next, we add our first geometry layer using ggdist::stat_halfeye(). This produces a Half Eye visualization, which is contains a half-density and a slab-interval. We remove the slab interval by setting .width = 0 and point_colour = NA. The half-density remains.

Get the code.

And here’s the output. We can see the half-denisty distributions for fuel economy (hwy) by engine size (cyl).

Add the Boxplot with geom_boxplot()

Next, add the second geometry layer using ggplot2::geom_boxplot(). This produces a narrow boxplot. We reduce the width and adjust the opacity.

Get the code.

And here’s the output. We now have a boxplot and half-density. We can see how the distributions vary compared to the median and inner-quartile range.

Add the Dot Plots with stat_dots()

Next, add the third geometry layer using ggdist::stat_dots(). This produces a half-dotplot, which is similar to a histogram that indicates the number of samples (number of dots) in each bin. We select side = "left" to indicate we want it on the left-hand side.

Get the code.

And here’s the output. We now have the three main geometries completed.

Making the plot look professional

We can clean up our plot with a professional-looking theme using tidyquant::theme_tq(). We’ll also rotate it with coord_flip() to give it the raincloud appearance.

Get the code.

We’ve just finalized our plot. We can see clearly that the distribution of the 6-cylinder is bi-modal, something you can’t tell with an ordinary boxplot. We should investigate why there are so many dots in 6-cylinder with low highway-fuel economy. We’ll save that for another R-Tip.

Summary

We learned how to make Raincloud Plots with ggdist. But, there’s a lot more to visualiztion.

It’s critical to learn how to visualize with ggplot2, which is the premier framework for data visualization in R.

If you’d like to learn ggplot2, data visualizations, and data science for business with R, then read on. ?

My Struggles with Learning Data Science

It took me a long time to learn data science. And I made a lot of mistakes as I fumbled through learning R. I specifically had a tough time navigating the ever increasing landscape of tools and packages, trying to pick between R and Python, and getting lost along the way.

If you feel like this, you’re not alone.

In fact, that’s the driving reason that I created Business Science and Business Science University (You can read about my personal journey here).

What I found out is that:

  1. Data Science does not have to be difficult, it just has to be taught smartly

  2. Anyone can learn data science fast provided they are motivated.

How I can help

If you are interested in learning R and the ecosystem of tools at a deeper level, then I have a streamlined program that will get you past your struggles and improve your career in the process.

It’s called the 5-Course R-Track System. It’s an integrated system containing 5 courses that work together on a learning path. Through 5+ projects, you learn everything you need to help your organization: from data science foundations, to advanced machine learning, to web applications and deployment.

The result is that you break through previous struggles, learning from my experience & our community of 2000+ data scientists that are ready to help you succeed.

Ready to take the next step? Then let’s get started.




? Top R-Tips Tutorials you might like:

  1. mmtable2: ggplot2 for tables
  2. ggside: Plot linear regression with marginal distributions
  3. DataEditR: Interactive Data Editing in R
  4. openxlsx: How to Automate Excel in R
  5. officer: How to Automate PowerPoint in R
  6. DataExplorer: Fast EDA in R
  7. esquisse: Interactive ggplot2 builder
  8. gghalves: Half-plots with ggplot2
  9. rmarkdown: How to Automate PDF Reporting
  10. patchwork: How to combine multiple ggplots

Want these tips every week? Join R-Tips Weekly.


To leave a comment for the author, please follow the link and comment on their blog: business-science.io.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.