[This article was first published on Bayes Ball, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A couple of days ago (2016-03-12) a short blog post by Bob Rudis appeared on R-bloggers.com, “Subtitles in ggplot2”. I was intrigued by the idea and what this could mean for my own plotting efforts, and it turned out to be very simple to apply. (Note that Bob’s post originally appeared on his own blog, as “Subtitles in ggplot2″.) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In order to see if I could create a plot with a subtitle, I went back to some of my own code drawing on the
Lahman
database package. The code below summarizes the data using dplyr
, and creates a ggplot2 plot showing the annual average number of runs scored by each team in every season from 1901 through 2014, including a trend line using the loess smoothing method.This is an update to my series of blog posts, most recently 2015-01-06, visualizing run scoring trends in Major League Baseball. # load the package into R, and open the data table 'Teams' into the # workspace library(Lahman) data(Teams) # # package load library(dplyr) library(ggplot2) # # CREATE SUMMARY TABLE # ==================== # create a new dataframe that # - filters from 1901 [the establishment of the American League] to the most recent year, # - filters out the Federal League # - summarizes the total number of runs scored, runs allowed, and games played # - calculates the league runs and runs allowed per game MLB_RPG <- Teams %>% filter(yearID > 1900, lgID != "FL") %>% group_by(yearID) %>% summarise(R=sum(R), RA=sum(RA), G=sum(G)) %>% mutate(leagueRPG=R/G, leagueRAPG=RA/G)
Plot the MLB runs per game trend
Below is the code to create the plot, including the formatting. Note the
hjust=0
(for horizontal justification = left) in the plot.title
line. This is because the default for the title is to be centred, while the subtitle is to be justified to the left.MLBRPGplot <- ggplot(MLB_RPG, aes(x=yearID, y=leagueRPG)) + geom_point() + theme_bw() + theme(panel.grid.minor = element_line(colour="gray95")) + scale_x_continuous(breaks = seq(1900, 2015, by = 20)) + scale_y_continuous(limits = c(3, 6), breaks = seq(3, 6, by = 1)) + xlab("year") + ylab("team runs per game") + geom_smooth(span = 0.25) + ggtitle("MLB run scoring, 1901-2014") + theme(plot.title = element_text(hjust=0, size=16)) MLBRPGplot
MLB run scoring, 1901-2014 |
Adding a subtitle: the function
So now we have a nice looking dot plot showing the average number of runs scored per game for the years 1901-2014.But a popular feature of charts–particularly in magazines–is a subtitle that has a summary of what the chart shows and/or what the author wants to emphasize.
In this case, we could legitimately say something like any of the following:
- The peak of run scoring in the 2000 season has been followed by a steady drop
- Teams scored 20% fewer runs in 2015 than in 2000
- Team run scoring has fallen to just over 4 runs per game from the 2000 peak of 5 runs
- Run scoring has been falling for 15 years, reversing a 30 year upward trend
How can we add a subtitle to our chart that does that?
The function Bob Rudis has created quickly and easily allows us to add a subtitle. The following code is taken from his blog post. Note that the code for this function relies on two additional packages,
grid
and gtable
. Other than the package loads, this is a straight copy/paste from Bob’s blog post.library(grid) library(gtable) ggplot_with_subtitle <- function(gg, label="", family=NULL, size=10, hjust=0, vjust=0, bottom_margin=5.5, newpage=is.null(vp), vp=NULL, ...) { if (is.null(family)) { gpr <- gpar(size=size, ...) } else { gpr <- gpar(family=family, size=size, ...) } subtitle <- textGrob(label, x=unit(hjust, "npc"), y=unit(hjust, "npc"), hjust=hjust, vjust=vjust, gp=gpr) data <- ggplot_build(gg) gt <- ggplot_gtable(data) gt <- gtable_add_rows(gt, grobHeight(subtitle), 2) gt <- gtable_add_grob(gt, subtitle, 3, 4, 3, 4, 8, "off", "subtitle") gt <- gtable_add_rows(gt, grid::unit(bottom_margin, "pt"), 3) if (newpage) grid.newpage() if (is.null(vp)) { grid.draw(gt) } else { if (is.character(vp)) seekViewport(vp) else pushViewport(vp) grid.draw(gt) upViewport() } invisible(data) }
Adding a subtitle
- Rename the active plot object
gg
(simply because that’s what Bob’s code uses) - Define the text that we want to be in the subtitle
- Call the function
# set the name of the current plot object to `gg` gg <- MLBRPGplot # define the subtitle text subtitle <- "Run scoring has been falling for 15 years, reversing a 30 year upward trend" ggplot_with_subtitle(gg, subtitle, bottom_margin=20, lineheight=0.9)
MLB run scoring, 1901-2014 with a subtitle |
Wasn’t that easy? Thanks, Bob!
And it’s going to get easier; in the few days since his blog post, Bob has taken this into the
ggplot2
development environment, working on the code necessary to add this as a simple extension to the package’s already extensive functionality. And Jan Schulz has chimed in, adding the ability to add a text annotation (e.g. the data source) under the plot. It’s early days, but it’s looking great. (See ggplot2
Pull request #1582.) Thanks, Bob and Jan!And thanks also to the rest of the
ggplot2
developers, for making those of us who use the package create good-looking and effective data visualization. Ain’t open development great?The code for this post (as an R markdown file) can be found in my Bayesball github repo.
-30-
To leave a comment for the author, please follow the link and comment on their blog: Bayes Ball.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.