Site icon R-bloggers

Creating Interactive Plots with R and Highcharts

[This article was first published on RStudio, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
By Mine Cetinkaya-Rundel Sometimes great ideas come from trying to solve simple problems. This seems to be especially true for software developers who are willing to put in an unreasonable amount of effort to solve a simple problem to their satisfaction. So the story goes that Torstein Hønsi, the founder and Chief Product Officer of Highcharts. was looking for a simple charting tool for updating his homepage with snow depth measurements from Vikjafjellet, the local mountain where his family keeps a cabin. Frustrated with the common flash plug-ins, and other proprietary solutions available at the time, he decided to build a standards-based solution of his own and then, of course, share it. In this post, I’ll use Joshua Kunst’s highcharter package, a wrapper for the Highcharts javascript library, along with Shiny to create some pretty slick plots. Please note that all products in this library are free for non-commercial use. For use in commercial projects and websites, see https://shop.highsoft.com/. The highcharter package enables the creation of Highcharts type plots within R. There are two main functions in the package: Plots are built in the spirit of ggplot2 by layering, although they use the pipe operator (%>%) instead of +. Other attractive features of the package are: We will demonstrate the functionality of this package as well as of Highcharts in general through a series of visualisation examples.

Example 1: US births on Friday the 13th

The inspiration for this visualization is a FiveThirtyEight article titled Some People Are Too Superstitious To Have A Baby On Friday The 13th. FiveThirtyEight generously makes the data used in (some of) their articles available on their GitHub repository. The data used in this particular analysis can be found here. Our goal is to recreate this particular visualization. In order to do so, we need to calculate the differences between the number of births on the 13th and the average of 6th and 20th of each month, and aggregate these values for the days of the week. This is nothing a bit of dplyr and tidyr can’t handle. Let’s load the necessary packages:
library(highcharter)
library(dplyr)
library(tidyr)
and the data:
births <- read.csv("data/births.csv")
Then we’ll calculate the differences in births as described in the article and store the results in a new data frame called diff13:
diff13 <- births %>%
  filter(date_of_month %in% c(6, 13, 20)) %>%
  mutate(day = ifelse(date_of_month == 13, "thirteen", "not_thirteen")) %>%
  group_by(day_of_week, day) %>%
  summarise(mean_births = mean(births)) %>%
  arrange(day_of_week) %>%
  spread(day, mean_births) %>%
  mutate(diff_ppt = ((thirteen - not_thirteen) / not_thirteen) * 100)
which looks like:
## Source: local data frame [7 x 4]
## Groups: day_of_week [7]
## 
##   day_of_week not_thirteen  thirteen   diff_ppt
##         <int>        <dbl>     <dbl>      <dbl>
## 1           1    11658.071 11431.429 -1.9440853
## 2           2    12900.417 12629.972 -2.0964008
## 3           3    12793.886 12424.886 -2.8841902
## 4           4    12735.145 12310.132 -3.3373249
## 5           5    12545.100 11744.400 -6.3825717
## 6           6     8650.625  8592.583 -0.6709534
## 7           7     7634.500  7557.676 -1.0062784
Note that the calculated percentage point differences (diff_ppt) may not match the ones in the visualization in the FiveThirtyEight article. There are two reasons for this:
  1. Holidays are excluded in the FiveThirtyEight but not in this analysis.
  2. Two data files are provided by FiveThirtyEight, one for years 1994 to 2003 and another for years 2000 to 2014. The numbers of births for the overlapping years (2000 – 2003) are not exactly the same. This app uses the SSA data for these years, however it is unclear which data source FiveThirtyEight used for these years.
Let’s start by making a very simple highchart of these data using the hchart() function:
hchart(diff13, "scatter", x = day_of_week, y = diff_ppt)

This plot has some attractive features. For example, if you hover over the points you should be able to view the actual values of the plotted data. But, we need to do some customization to make the plot look like the one in the FiveThirtyEight article. We can achieve that using the highchart() function along with some customization functions. Note that we separate the layers with the pipe operator.
highchart() %>%
  hc_add_series(data = round(diff13$diff_ppt, 4), type = "column",
                name = "Difference, in ppt",
                color = "#F0A1EA", showInLegend = FALSE) %>%
  hc_yAxis(title = list(text = "Difference, in ppt"), allowDecimals = FALSE) %>%
  hc_xAxis(categories = c("Monday", "Tuesday", "Wednesday", "Thursday", 
                          "Friday", "Saturday", "Sunday"),
           tickmarkPlacement = "on",
           opposite = TRUE) %>%
  hc_title(text = "The Friday the 13th effect",
           style = list(Weight = "bold")) %>% 
  hc_subtitle(text = "Difference in the share of U.S. births on 13th of each month 
                     from the average of births on the 6th and the 20th,
                     1994 - 2004") %>%
  hc_tooltip(valueDecimals = 4,
             pointFormat = "Day: {point.x} <br> Diff: {point.y}") %>%
  hc_credits(enabled = TRUE, 
             text = "Sources: CDC/NCHS, SOCIAL SECURITY ADMINISTRATION",
             style = list(Size = "10px")) %>%
  hc_add_theme(hc_theme_538())

Once again, an attractive feature of this visualization is the hover tooltip. Themes also make it easy to change the look of the plot (in this case using hc_theme_538() gets us very close to the original visualization). Additionally, we are able to easily change labels (e.g., names of days) without having to make changes in the original data.

Example 2: US births on Friday the 13th, an interactive look

Since the highcharter package is powered by htmlwidlgets, it is also Shiny compatible! In order to build a highchart within a Shiny app, we use the renderHighchart() function. We have built an app that extends the visualization we created earlier, allowing for custom selection of years plotted, type of plot, and theme. A screenshot of the app is shown below, and you can view the app and the source code here.  

Things to Look Out For

Highcharts’ built-in and customizable hover/tooltip box and zooming functionality are among its most attractive features. However, whether or not these features would be useful for you depends on your use case. For example, the tooltip is not as useful if you are plotting data with larger sample sizes. Take a look at this plot of arrival vs. departure delays of flights headed to Los Angeles (LAX) in October 2013 from the various New York airports. The overplotting on the lower left of the plot makes the hovering functionality not that useful.
library(nycflights13)
oct_lax_flights <- flights %>%
  filter(month == 10, dest == "LAX")
hchart(oct_lax_flights, "scatter", x = dep_delay, y = arr_delay, group = origin)

  However, if we aggregated the data a bit to reduce the number of points plotted, this functionality could once again come in handy. For example, below we group the flights by 15-minute intervals in departure delays, and plot the median arrival delay for these intervals.
oct_lax_flights_agg <- oct_lax_flights %>%
  mutate(dep_delay_cat = cut(dep_delay, breaks = seq(-15, 255, 15))) %>%
  group_by(origin, dep_delay_cat) %>%
  summarise(med_arr_delay = median(arr_delay, na.rm = TRUE))
hchart(oct_lax_flights_agg, "line", x = dep_delay_cat, y = med_arr_delay, group = origin)

Summary

Highcharts provides high quality web graphics with high customizability, and the highcharter package allows R users to take full advantage of them. If you’re interested in finding out more about the functionality of the package, I highly recommend browsing the highcharter package homepagewhich contains a variety of Highcharts, Highstock and Highmaps plots along with sample code to reproduce them. Additionally, the Highcharts Options Reference page is immensely useful for finding the specific syntax for customization options.

To leave a comment for the author, please follow the link and comment on their blog: RStudio.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.