ggExtra: R package for adding marginal histograms to ggplot2
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
My first CRAN package, ggExtra
, contains several functions to enhance ggplot2, with the most important one being ggExtra::ggMarginal()
– a function that finally allows easily adding marginal density plots or histograms to scatterplots.
Availability
You can read the full README describing the functionality in detail or browse the source code on GitHub.
The package is available through both CRAN (install.packages("ggExtra")
) and GitHub (devtools::install_github("daattali/ggExtra")
).
Spoiler alert – final result
You can see a demo of what ggMarginal
can do and play around with it in this Shiny app.
Here is an example of how easy it is to add marginal histograms in ggplot2 using ggExtra::ggMarginal()
.
library(ggplot2)
# create dataset with 1000 normally distributed points
df <- data.frame(x = rnorm(1000, 50, 10), y = rnorm(1000, 50, 10))
# create a ggplot2 scatterplot
p <- ggplot(df, aes(x, y)) + geom_point() + theme_classic()
# add marginal histograms
ggExtra::ggMarginal(p, type = "histogram")
Marginal plots in ggplot2 - The problem
Adding marginal histograms or density plots to ggplot2
seems to be a common issue. Any Google search will likely find several StackOverflow and R-Bloggers posts about the topic, with some of them providing solutions using base
graphics or lattice
. While there are some great answers about how to solve this for ggplot2
, they are usually very specific to the dataset in question and do not provide code that is easily reusable.
A simple drop-in function for adding marginal plots to ggplot2 did not exist, so I created one.
Marginal plots in ggplot2 - Basic idea
The main idea is to create the marginal plots (histogram or density) and then use the gridExtra
package to arrange the scatterplot and the marginal plots in a “2x2 grid” to achieve the desired visual output. An empty plot needs to be created as well to fill in one of the four grid corners. This basic approach can be implemented like this:
library(ggplot2)
library(gridExtra)
pMain <- ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point()
pTop <- ggplot(mtcars, aes(x = wt)) +
geom_histogram()
pRight <- ggplot(mtcars, aes(x = mpg)) +
geom_histogram() + coord_flip()
pEmpty <- ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_blank() +
theme(axis.text = element_blank(),
axis.title = element_blank(),
line = element_blank(),
panel.background = element_blank())
grid.arrange(pTop, pEmpty, pMain, pRight,
ncol = 2, nrow = 2, widths = c(3, 1), heights = c(1, 3))
This works, but it’s a bit tedious to write, so at first I just wanted a simple function to abstract all this ugly code away. This was the birth of ggMarginal
, which was later developed into the ggExtra
package, together with a few other functions.
The abstraction was done in a way that allows the user to either provide a ggplot2 scatterplot, or the dataset and variables. For example, the following two calls are equivalent:
ggExtra::ggMarginal(data = mtcars, x = "wt", y = "mpg")
ggExtra::ggMarginal(ggplot(mtcars, aes(wt, mpg)) + geom_point())
Marginal plots in ggplot2 - Next steps
As you can see, that basic plot works, but it is not very nice looking and can have some work done on it. A few things come to mind quickly:
- Remove the whitespace between the scatterplot and the marginal plots
- Remove the marginal plots background
- Remove the axis labels from the marginal plots
These are all very easy to add with various ggplot2::theme()
parameters, and adding these to a ggMarginal
function will already provide a nice useful function for adding marginal plots to ggplot2.
There are some more issues that could be addressed in order to make the function even more robust.
- The marginal plot doesn’t necessary have the same axes as the scatterplot - notice how the
mpg
axis range in the previous plot don’t match up between the scatterplot and the marginal histogram. - If the main plot has a title, then the right marginal plot will go “too high”.
- If the axis labels text is enlarged, then a similar issue happens - the marginal plots position will be out of sync with the main scatterplot.
- If the axis label is multiline, then a similar issue happens again.
The following plot illustrates all these problems. It was achieved with exactly the same code as before, but adding these 3 lines to pMain
definition:
theme_gray(35) +
ggtitle("Cars weight vs miles/gallon") +
xlab("carnweight")
Accounting for these issues is a little trickier and requires a bit of “dirty” code. To address these problems, I used ggplot_build()
, which is a handy function that can be used to retrieve information from a plot. Using ggplot_build
, it’s possible to look at the internals of a plot object and identify the axis range, the text size, etc. It’s importante to note that since these parameters are not provided via a direct function call, it’s not considered 100% safe to use them because there is no guarantee that the plot internals will always look the same way. I won’t post the code here because it’s long but you can view the source code of my solution on GitHub.
Lastly, a function that adds marginal plots to a ggplot2 scatterplot could benefit from a few more features to make it more complete:
- Support drawing a marginal plot only along the x or y axis, not necessarily both.
- Support making the marginal plot either a density plot or a histogram.
- Allow the user to set the marginal plot’s colour and relative size.
All of these features and more are implemented in ggExtra::ggMarginal
.
Here is an example of using a few more parameters:
library(ggplot2)
# create dataset with 500 normally distributed points
df <- data.frame(x = rnorm(500, 50, 3), y = rnorm(500, 50, 3))
# create a ggplot2 scatterplot
p <- ggplot(df, aes(x, y)) + geom_point() +
theme_bw(30) + ggtitle("500 random points")
# add marginal density along the y axis
ggExtra::ggMarginal(p, type = "density", margins = "y", size = 4, marginCol = "red")
Other functions in the ggExtra
package
ggExtra
provides with a few extra convenience functions:
removeGrid
- Remove grid lines from ggplot2. Minor grid lines are always removed, and the major x or y grid lines can be removed as well.rotateTextX
- Rotate x axis labels. Often times it is useful to rotate the x axis labels to be vertical if there are too many labels and they overlap.plotCount
- Plot count data with ggplot2. Quickly plot a bar plot of count (frequency) data that is stored in a table or data.frame.
Technical notes about using gridExtra
gridExtra
is a very useful package with two functions for showing multiple ggplot2 plots: arrangeGrob
and grid.arrange
. However, using these functions inside a package has proven to be difficult because of the way gridExtra
handles namespaces. A short discussion can be found on this StackOverflow post. While I do not completely undersand the underlying problem (I don’t fully understand package mports/depends/attaching/etc), I did find workarounds to the problems and would love feedback if anyone has any comments.
Problem 1: could not find function “ggplotGrob”
When trying to call gridExtra::grid.arrange()
without loading ggplot2
you get this error:
f <- function() {
p1 <- ggplot2::ggplot(mtcars, ggplot2::aes(wt, mpg)) + ggplot2::geom_blank()
gridExtra::grid.arrange(p1)
}
f()
> Error: could not find function "ggplotGrob"
My workaround is to ensure ggplot2
is loaded:
f <- function() {
if (!"package:ggplot2" %in% search()) {
suppressPackageStartupMessages(attachNamespace("ggplot2"))
on.exit(detach("package:ggplot2"))
}
p1 <- ggplot2::ggplot(mtcars, ggplot2::aes(wt, mpg)) + ggplot2::geom_blank()
gridExtra::grid.arrange(p1)
}
f()
I know it’s hacky so I would appreciate better solutions.
Problem 2: No layers in plot
The problem with grid.arrange
is that it returns NULL
and does not allow the plot to be saved to an object. arrangeGrob
is a similar function that returns the object. But substituting arrangeGrob
for grid.arrange
gives an error
f <- function() {
if (!"package:ggplot2" %in% search()) {
suppressPackageStartupMessages(attachNamespace("ggplot2"))
on.exit(detach("package:ggplot2"))
}
p1 <- ggplot2::ggplot(mtcars, ggplot2::aes(wt, mpg)) + ggplot2::geom_blank()
(gridExtra::arrangeGrob(p1))
}
f()
> Error: No layers in plot
This error happens only if gridExtra
is not loaded, and it’s because printing the object is done after the function returns and uses a custom print method. So the solution is to add a class to the return object and add a print generic that ensures the object will print correctly.
f <- function() {
if (!"package:ggplot2" %in% search()) {
suppressPackageStartupMessages(attachNamespace("ggplot2"))
on.exit(detach("package:ggplot2"))
}
p1 <- ggplot2::ggplot(mtcars, ggplot2::aes(wt, mpg)) + ggplot2::geom_blank()
grob <- gridExtra::arrangeGrob(p1)
class(grob) <- c("mygrob", class(grob))
grob
}
print.mygrob <- function(x, ...) {
grid::grid.draw(x)
}
f()
These were my solutions to the gridExtra
problems that I implemented in ggExtra
, but I would appreciate feedback on other approaches.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.