Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This article is part of R-Tips Weekly, a weekly video tutorial that shows you step-by-step how to do common R coding tasks.
Here are the links to get set up. ?
What are Marginal Distributions?
And how can I use them to uncover complex relationships?
What are Marginal Distributions?
Marginal Distribution (Density) plots are a way to extend your numeric data with side plots that highlight the density (histogram or boxplots work too).
Marginal Distribution Plots were made popular with the seaborn jointplot() side-panels in Python. These add side plots that highlight distributions.
How do we make them in ggplot2?
Marginal distributions can now be made in R using ggside
, a new ggplot2 extension. You can make linear regression with marginal distributions using histograms, densities, box plots, and more. Bonus – The side panels are super customizable for uncovering complex relationships.
Here are two examples of what you can (and will) do in this tutorial! ?
Example 1:
Linear Regression with Marginal Distribution (Density) Side-Plots (Top and Left)
Example 2:
Facet-Plot with Marginal Box Plots (Top)
Before we get started, get the Cheat Sheet
ggside
is great for making marginal distribution side plots. But, you’ll still need to learn how to visualize data with ggplot2. For those topics, I’ll use the Ultimate R Cheat Sheet to refer to ggplot2
code in my workflow.
Quick Example:
Download the Ultimate R Cheat Sheet. Then Click the “CS” next to “ggplot2” which opens the Data Visualization with Dplyr Cheat Sheet.
Now you’re ready to quickly reference ggplot2
functions.
Load Libraries & Data
The libraries we’ll need today are patchwork, ggridges, ggrepel, maps, tidyverse, and lubridate. All packages are available on CRAN and can be installed with install.packages()
. Note – I’m using the development version of ggside
, which is what I recommend in the YouTube Video .
The dataset is the mpg data that comes with ggplot2.
SLinear Regression with Marginal Distribution Plot
Replicating Seaborn’s jointdist() plot
We’ll start by replicating what you can do in Python’s Seaborn jointdist() Plot. We’ll accomplish this with ggside::geom_xsidedensity()
We set up the plot just like a normal ggplot.
Refer to the Ultimate R Cheat Sheet for:
ggplot()
geom_point()
geom_smooth()
Next we add from ggside:
geom_xsidedensity()
– Adds a side density panel (top panel).geom_ysidedensity()
– Adds a side density panel (right panel).
The trick is using the after_stat(density)
, which makes an awesome looking marginal density side panel plot. I increased the size of the marginal density panels with the theme(ggside.panel.scale.x)
.
Loess Regression w/ Marginal Density
We generate the regression plot with marginal distributions (density) to highlight key differences between the automobile classes. We can see:
- Pickup, SUV – Have the lowest Highway Fuel Economy (MPG)
- 2seater, Compact, Midsize, Subcompact – Have the highest Highway Fuel Economy
Need help learning ggplot2?
In the R for Business Analysis (DS4B 101-R) Course , I teach 5-hours just on ggplot2. Learn:
- Geometries
- Scales
- Themes
- And advanced customizations: Labeled Heat Maps and Lollipop Charts
Plot 2. Faceted Side-Panels
Next, let’s try out some advanced functionality. I want to see how ggside handles faceted plots, which are subplots that vary based on a categorical feature. We’ll use the “cyl” column to facet, which is for engine size (number of cylinders).
Faceted Side Panels? No problem.
Awesome! I have included facets by “cyl”, which creates four plots based on the engine size. ggside picked up on the facets and has made 4 side-panel plots.
Amazing. ggside just works.
Congrats. You just quickly made two report-quality plots with ggplot2 and ggside. Excellent work.
But it gets better
You’ve just scratched the surface.
What is the best way to become proficient in data science?
You’re probably thinking:
- There’s so much to learn.
- My time is precious.
I have good news that will put those doubts behind you.
You can learn data science with my state-of-the-art Full 5 Course R-Track System .
Become the data science expert in your organization.
Get Five of our Premium R Courses that Build Expert-Level Machine Learning Skills, Web Application Skills, & Time Series Skills.
? Full 5 Course R-Track System
Taking these courses is equivalent to:
- 9-Months of Methodical Code-Based Learning.
- 250+ tool-based MOOC courses.
- Education Comparable to 9-months of University Courses.
- 5 end-to-end projects.
- 5 frameworks.
Unlock the Full 5 Course R-Track System
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.