Site icon R-bloggers

Highlighting with ggplot2: The Old School and New School Way

[This article was first published on Little Miss Data, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A while back I had tweeted about a really cool technique that can be used with ggplot2 in R to highlight a subset of your data, while keeping in perspective the trend of the full data set.  I found out about this trick through a bit of a tangled web.  Please stay with me while I lay it out for you.   It all started with a tweet that Mara Averick shared from a blog that Simon Jackson wrote about a technique that Hadley Wickham discussed in his ggplot2 book.   Confused yet?  Well, the good news is that actually implementing the technique is a lot easier than following the discovery path!

   

Highlighting The Old School Way

To implement this idea, we don’t need any fancy packages other than ggplot2.  The steps are simple:

  1. Using ggplot2, create a plot with your full data set in grey.
  2. Create a new data frame that has been subset to only include the data which you would like to highlight.  
  3. Add the highlighted data on to your plot created in step 1.  Set the color to something other than grey.  
  4. Celebrate!

Example

For our example, we are going to examine the crime incident dataset from Seattle 911 Calls on data.gov.  Note that I have covered this data set through multiple blog posts already such as map plots in R and time based heat maps.  

Install and Load Libraries

install.packages("lubridate")
install.packages("ggplot2")
install.packages("ggmap")
install.packages("data.table")
install.packages("ggrepel")
install.packages("dplyr")
install.packages("magrittr")

library(lubridate)
library(ggplot2)
library(ggmap)
library(dplyr)
library(data.table)
library(ggrepel)
library(magrittr)

Download the Data

incidents= fread('https://raw.githubusercontent.com/lgellis/MiscTutorial/master/ggmap/i2Sample.csv', stringsAsFactors = FALSE)
str(incidents) attach(incidents)

# Create some color variables for graphing later
custGrey = "#A9A9A9"

#add year to the incidents data frame
incidents$ymd <-mdy_hms(Event.Clearance.Date)
incidents$month <- lubridate::month(incidents$ymd)
incidents$year <- year(incidents$ymd)
incidents$wday <- lubridate::wday(incidents$ymd, label = TRUE)
incidents$hour <- hour(incidents$ymd)

#Create a more manageable data frame with only 2017 data
i2 <- incidents[year>=2017, ]

#Only include complete cases
i2[complete.cases(i2), ]

attach(i2)
head(i2)

Create a basic time series plot showing the count of 911 event types by month.  

#Group the data into a new data frame which has the count of events per month by subgroup

groupSummaries <- i2 %>%
  group_by(month, Event.Clearance.SubGroup) %>%
  summarize(N = length(Event.Clearance.SubGroup))

#View the new data set

head(groupSummaries, n=100)
attach(groupSummaries)

#Graph the data set through ggplot 2

ggplot(groupSummaries, aes(x=month, y=N, color=Event.Clearance.SubGroup) )+ 
  geom_line() +
  theme(legend.position="bottom",legend.text=element_text(size=7),
        legend.title = element_blank()) +
  scale_x_discrete(name ="Month", 
                 limits=c(3,6,9,12))

   

 

Create a Graph Highlighting Data with a Max Month Count of 95 or Greater

# Create a data frame with only events types that have had a peak of 95 calls in a month or more

groupSummariesF <- groupSummaries %>%
  group_by(Event.Clearance.SubGroup) %>% 
  filter(max(N) > 95) %>%
  ungroup()

head(groupSummariesF)

# Create a layered plot with one layer of grey data for the full data set and one layer of color data for the subset data set 

ggplot() +
  geom_line(aes(month, N, group = Event.Clearance.SubGroup), 
            data = groupSummaries, colour = alpha("grey", 0.7)) +
  geom_line(aes(month, N, group = Event.Clearance.SubGroup, colour = Event.Clearance.SubGroup), 
            data = groupSummariesF) +  
  scale_x_discrete(name ="Month", 
                   limits=c(3,6,9,12)) +
  theme(legend.position="bottom",legend.text=element_text(size=7),
        legend.title = element_blank())

   

 

One of the great things about the “old school way” of doing this type of highlighting is that it can be done with presumably every extension to the ggplot2 package.  For example, you can use this same technique to highlight with the ggmap package.   The code for these graphs is incredibly simple and has been included in my github repo.  

 

Highlighting The New School Way

While the above methodology is quite easy, it can be a bit of a pain at times to create and add the new data frame.  Further, you have to tinker more with the labelling to really call out the highlighted data points.  

Thanks to Hiroaki Yutani, we now have the gghighlight package which does most of the work for us with a small function call!!   Please note that a lot of this code was created by looking at examples on her introduction document

The new school way is even more simple:

  1. Using ggplot2, create a plot with your full data set.
  2. Add the gghighlight() function to your plot with the conditions set to identify your subset.
  3. Celebrate.

Example

For our first example, we are going to create the same time series graph from above.  However, we are going to perform the highlighting with gghighlight vs manual layering.  

# Install the gghighlight package

install.packages("gghighlight")
library(gghighlight)

# Create the highlighted graph

ggplot(groupSummaries, aes(month, N, colour = Event.Clearance.SubGroup)) +
  geom_line() + 
  gghighlight(max(N) > 95,  label_key = Event.Clearance.SubGroup) +  
  scale_x_discrete(name ="Month", 
                   limits=c(3,6,9,12))
   

More Examples

Well that was so easy, we are going to try a few more ggmap plot types to see how we fare.  Below show both a scatterplot and histogram chart.  

# Try a scatterplot chart

ggplot(groupSummaries, aes(month, N, colour = Event.Clearance.SubGroup, use_group_by=FALSE)) +
  geom_point() +
  gghighlight(N > 200,  label_key = Event.Clearance.SubGroup) + 
  scale_x_discrete(name ="Month",
                   limits=c(3,6,9,12))

# Try a histogram chart

ggplot(groupSummaries, aes(N, fill = Event.Clearance.SubGroup)) +
  geom_histogram() +
  theme(legend.position="bottom",legend.text=element_text(size=7),
        legend.title = element_blank()) +
  gghighlight(N > 100,  label_key = Event.Clearance.SubGroup, use_group_by = FALSE) +
  facet_wrap(~ Event.Clearance.SubGroup)

THANK YOU

Thanks for reading along while we explored data highlighting through layers and gghighlight.  Please share your thoughts and creations with me on twitter

Note that the full code is available on my  github repo.  If you have trouble downloading the file from github, go to the main page of the repo and select “Clone or Download” and then “Download Zip”.

To leave a comment for the author, please follow the link and comment on their blog: Little Miss Data.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.