Scatterplot with marginal boxplots
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Using R and ggplot2 to draw a scatterplot with the two marginal boxplots
Drawing a scatterplot with the marginal boxplots (or marginal histograms or marginal density plots) has always been a bit tricky (well for me anyway). The approach I take here is, first, to draw the three separate plots using ggplot2:
- the scatterplot;
- the horizontal boxplot to appear in the top margin;
- the vertical scatterplot to appear in the right margin;
then second, to set widths and heights of the spaces used for axis and tick mark labels, and to combine the three plots using functions from the gtable package. The difficulty has been to ensure that the tick mark labels in the scatterplot panel and in the top marginal boxplot panel take up the same space. Functions from the gtable package make this a reasonably straightforward process.
To draw the following chart, I borrowed and modified code from here and here. The final code and data are available on GitHub.
Drawing the plot
This example uses the mtcars
dataframe, available in base R. For convenience, the file mtcars marginal boxplots.R on GitHub contains all the code. First, load the ggplot2 and gtable packages and the mtcars dataframe.
library(ggplot2) library(gtable) data(mtcars)
Draw the scatterplot.
The plot margins are adjusted so that the spaces between the panels are reduced. Also, there is an ever-so-slight mismatch of the gridlines across the panels. The way to fix it is to remove the offset on each axis (expand=c(0,0)
), then select an offset of your choice (expand_limits(...)
). There are similar adjustments made to the marginal plots.
p1 <- ggplot(mtcars, aes(mpg, hp)) + geom_point() + scale_x_continuous(expand = c(0, 0)) + scale_y_continuous(expand = c(0, 0)) + expand_limits(y = c(min(mtcars$hp) - 0.1 * diff(range(mtcars$hp)), max(mtcars$hp) + 0.1 * diff(range(mtcars$hp)))) + expand_limits(x = c(min(mtcars$mpg) - 0.1 * diff(range(mtcars$mpg)), max(mtcars$mpg) + 0.1 * diff(range(mtcars$mpg)))) + theme(plot.margin = unit(c(0.2, 0.2, 0.5, 0.5), "lines"))
Draw the marginal boxplots
Note that the margins and axis offsets are adjusted to match those in the scatterplot. Also, the tick mark labels and axis titles for the x-axis and the y-axis are removed.
# Horizontal marginal boxplot - to appear at the top of the chart p2 <- ggplot(mtcars, aes(x = factor(1), y = mpg)) + geom_boxplot(outlier.colour = NA) + geom_jitter(position = position_jitter(width = 0.05)) + scale_y_continuous(expand = c(0, 0)) + expand_limits(y = c(min(mtcars$mpg) - 0.1 * diff(range(mtcars$mpg)), max(mtcars$mpg) + 0.1 * diff(range(mtcars$mpg)))) + coord_flip() + theme(axis.text = element_blank(), axis.title = element_blank(), axis.ticks = element_blank(), plot.margin = unit(c(1, 0.2, -0.5, 0.5), "lines")) # Vertical marginal boxplot - to appear at the right of the chart p3 <- ggplot(mtcars, aes(x = factor(1), y = hp)) + geom_boxplot(outlier.colour = NA) + geom_jitter(position = position_jitter(width = 0.05)) + scale_y_continuous(expand = c(0, 0)) + expand_limits(y = c(min(mtcars$hp) - 0.1 * diff(range(mtcars$hp)), max(mtcars$hp) + 0.1 * diff(range(mtcars$hp)))) + theme(axis.text = element_blank(), axis.title = element_blank(), axis.ticks = element_blank(), plot.margin = unit(c(0.2, 1, 0.5, -0.5), "lines"))
Get the gtables for the three plots
gt1 <- ggplot_gtable(ggplot_build(p1)) gt2 <- ggplot_gtable(ggplot_build(p2)) gt3 <- ggplot_gtable(ggplot_build(p3))
Set the maximum widths and heights for x-axis and y-axis titles and text
The gtables store information required to draw the plots, including the widths of the spaces occupied by the y-axis titles and tick mark labels. The code gets the maximum widths of these spaces for the scatterplot and the horizontal marginal boxplot (gt1 and gt2), then sets that maximum as the width in the two gtables. So that there are no problems with the vertical alignment of the scatterplot and the vertical marginal boxplot, the heights are similarly set for gt1 and gt3.
# Get maximum widths and heights maxWidth <- unit.pmax(gt1$widths[2:3], gt2$widths[2:3]) maxHeight <- unit.pmax(gt1$heights[4:5], gt3$heights[4:5]) # Set the maximums in the gtables for gt1, gt2 and gt3 gt1$widths[2:3] <- as.list(maxWidth) gt2$widths[2:3] <- as.list(maxWidth) gt1$heights[4:5] <- as.list(maxHeight) gt3$heights[4:5] <- as.list(maxHeight)
Combine the scatterplot with the two marginal boxplots
The following code creates a new gtable (gt), inserts the modified gt1, gt2 and gt3 into the new gtable, then renders the plot according to the information stored in the new gtable. Finally, a box is drawn around the combined plot.
# Create a new gtable gt <- gtable(widths = unit(c(7, 1), "null"), height = unit(c(1, 7), "null")) # Instert gt1, gt2 and gt3 into the new gtable gt <- gtable_add_grob(gt, gt1, 2, 1) gt <- gtable_add_grob(gt, gt2, 1, 1) gt <- gtable_add_grob(gt, gt3, 2, 2) # And render the plot grid.newpage() grid.draw(gt) grid.rect(x = 0.5, y = 0.5, height = 0.995, width = 0.995, default.units = "npc", gp = gpar(col = "black", fill = NA, lwd = 1))
Similar logic applies to the drawing of marginal density plots. The code shown below is also available in the file mtcars marginal density plots.R on GitHub.
# Main scatterplot p1 <- ggplot(mtcars, aes(mpg, hp)) + geom_point() + scale_x_continuous(expand = c(0, 0)) + scale_y_continuous(expand = c(0, 0)) + expand_limits(y = c(min(mtcars$hp) - 0.1 * diff(range(mtcars$hp)), max(mtcars$hp) + 0.1 * diff(range(mtcars$hp)))) + expand_limits(x = c(min(mtcars$mpg) - 0.1 * diff(range(mtcars$mpg)), max(mtcars$mpg) + 0.1 * diff(range(mtcars$mpg)))) + theme(plot.margin = unit(c(0.2, 0.2, 0.5, 0.5), "lines")) # Horizontal marginal density plot - to appear at the top of the chart p2 <- ggplot(mtcars, aes(x = mpg)) + geom_density() + scale_x_continuous(expand = c(0, 0)) + expand_limits(x = c(min(mtcars$mpg) - 0.1 * diff(range(mtcars$mpg)), max(mtcars$mpg) + 0.1 * diff(range(mtcars$mpg)))) + theme(axis.text = element_blank(), axis.title = element_blank(), axis.ticks = element_blank(), plot.margin = unit(c(1, 0.2, -0.5, 0.5), "lines")) # Vertical marginal density plot - to appear at the right of the chart p3 <- ggplot(mtcars, aes(x = hp)) + geom_density() + scale_x_continuous(expand = c(0, 0)) + expand_limits(x = c(min(mtcars$hp) - 0.1 * diff(range(mtcars$hp)), max(mtcars$hp) + 0.1 * diff(range(mtcars$hp)))) + coord_flip() + theme(axis.text = element_blank(), axis.title = element_blank(), axis.ticks = element_blank(), plot.margin = unit(c(0.2, 1, 0.5, -0.5), "lines")) # Get the gtables gt1 <- ggplot_gtable(ggplot_build(p1)) gt2 <- ggplot_gtable(ggplot_build(p2)) gt3 <- ggplot_gtable(ggplot_build(p3)) # Get maximum widths and heights for x-axis and y-axis title and text maxWidth <- unit.pmax(gt1$widths[2:3], gt2$widths[2:3]) maxHeight <- unit.pmax(gt1$heights[4:5], gt3$heights[4:5]) # Set the maximums in the gtables for gt1, gt2 and gt3 gt1$widths[2:3] <- as.list(maxWidth) gt2$widths[2:3] <- as.list(maxWidth) gt1$heights[4:5] <- as.list(maxHeight) gt3$heights[4:5] <- as.list(maxHeight) # Combine the scatterplot with the two marginal boxplots # Create a new gtable gt <- gtable(widths = unit(c(7, 2), "null"), height = unit(c(2, 7), "null")) # Instert gt1, gt2 and gt3 into the new gtable gt <- gtable_add_grob(gt, gt1, 2, 1) gt <- gtable_add_grob(gt, gt2, 1, 1) gt <- gtable_add_grob(gt, gt3, 2, 2) # And render the plot grid.newpage() grid.draw(gt) grid.rect(x = 0.5, y = 0.5, height = 0.995, width = 0.995, default.units = "npc", gp = gpar(col = "black", fill = NA, lwd = 1))