Creating Interactive Plots with R and Highcharts
[This article was first published on RStudio, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
By Mine Cetinkaya-Rundel
Sometimes great ideas come from trying to solve simple problems. This seems to be especially true for software developers who are willing to put in an unreasonable amount of effort to solve a simple problem to their satisfaction. So the story goes that Torstein Hønsi, the founder and Chief Product Officer of Highcharts. was looking for a simple charting tool for updating his homepage with snow depth measurements from Vikjafjellet, the local mountain where his family keeps a cabin. Frustrated with the common flash plug-ins, and other proprietary solutions available at the time, he decided to build a standards-based solution of his own and then, of course, share it.
In this post, I’ll use Joshua Kunst’s highcharter package, a wrapper for the Highcharts javascript library, along with Shiny to create some pretty slick plots.
Please note that all products in this library are free for non-commercial use. For use in commercial projects and websites, see https://shop.highsoft.com/.
The highcharter package enables the creation of Highcharts type plots within R.
There are two main functions in the package:
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
highchart()
: Creates a Highchart chart object using htmlwidgets. The widget can be rendered on HTML pages generated from R Markdown, Shiny, or other applications.hchart()
: Useshighchart()
to draw a plot for different R object classes using a convenient single command. Specifically, it can plot data frames, numeric, histogram, character, density, factors, ts, mts, xts, stl, ohlc, acf, forecast, mforecast, ets, igraph, dist, dendrogram, phylo, and survfit classes.
%>%
) instead of +
.
Other attractive features of the package are:
- Theming: It is possible to configure your plots with pre-implemented themes like Economist, Financial Times, Google, and FiveThirtyEight among others.
- Plugins: Motion, drag points, fontawesome, url-pattern, annotations.
Example 1: US births on Friday the 13th
The inspiration for this visualization is a FiveThirtyEight article titled Some People Are Too Superstitious To Have A Baby On Friday The 13th. FiveThirtyEight generously makes the data used in (some of) their articles available on their GitHub repository. The data used in this particular analysis can be found here. Our goal is to recreate this particular visualization. In order to do so, we need to calculate the differences between the number of births on the 13th and the average of 6th and 20th of each month, and aggregate these values for the days of the week. This is nothing a bit of dplyr and tidyr can’t handle. Let’s load the necessary packages:library(highcharter) library(dplyr) library(tidyr)
births <- read.csv("data/births.csv")
diff13
:
diff13 <- births %>% filter(date_of_month %in% c(6, 13, 20)) %>% mutate(day = ifelse(date_of_month == 13, "thirteen", "not_thirteen")) %>% group_by(day_of_week, day) %>% summarise(mean_births = mean(births)) %>% arrange(day_of_week) %>% spread(day, mean_births) %>% mutate(diff_ppt = ((thirteen - not_thirteen) / not_thirteen) * 100)
## Source: local data frame [7 x 4] ## Groups: day_of_week [7] ## ## day_of_week not_thirteen thirteen diff_ppt ## <int> <dbl> <dbl> <dbl> ## 1 1 11658.071 11431.429 -1.9440853 ## 2 2 12900.417 12629.972 -2.0964008 ## 3 3 12793.886 12424.886 -2.8841902 ## 4 4 12735.145 12310.132 -3.3373249 ## 5 5 12545.100 11744.400 -6.3825717 ## 6 6 8650.625 8592.583 -0.6709534 ## 7 7 7634.500 7557.676 -1.0062784Note that the calculated percentage point differences (
diff_ppt
) may not match the ones in the visualization in the FiveThirtyEight article. There are two reasons for this:
- Holidays are excluded in the FiveThirtyEight but not in this analysis.
- Two data files are provided by FiveThirtyEight, one for years 1994 to 2003 and another for years 2000 to 2014. The numbers of births for the overlapping years (2000 – 2003) are not exactly the same. This app uses the SSA data for these years, however it is unclear which data source FiveThirtyEight used for these years.
hchart()
function:
hchart(diff13, "scatter", x = day_of_week, y = diff_ppt)
highchart() %>% hc_add_series(data = round(diff13$diff_ppt, 4), type = "column", name = "Difference, in ppt", color = "#F0A1EA", showInLegend = FALSE) %>% hc_yAxis(title = list(text = "Difference, in ppt"), allowDecimals = FALSE) %>% hc_xAxis(categories = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"), tickmarkPlacement = "on", opposite = TRUE) %>% hc_title(text = "The Friday the 13th effect", style = list(fontWeight = "bold")) %>% hc_subtitle(text = "Difference in the share of U.S. births on 13th of each month from the average of births on the 6th and the 20th, 1994 - 2004") %>% hc_tooltip(valueDecimals = 4, pointFormat = "Day: {point.x} <br> Diff: {point.y}") %>% hc_credits(enabled = TRUE, text = "Sources: CDC/NCHS, SOCIAL SECURITY ADMINISTRATION", style = list(fontSize = "10px")) %>% hc_add_theme(hc_theme_538())
hc_theme_538()
gets us very close to the original visualization). Additionally, we are able to easily change labels (e.g., names of days) without having to make changes in the original data.
Example 2: US births on Friday the 13th, an interactive look
Since the highcharter package is powered by htmlwidlgets, it is also Shiny compatible! In order to build a highchart within a Shiny app, we use therenderHighchart()
function.
We have built an app that extends the visualization we created earlier, allowing for custom selection of years plotted, type of plot, and theme. A screenshot of the app is shown below, and you can view the app and the source code here.
Things to Look Out For
Highcharts’ built-in and customizable hover/tooltip box and zooming functionality are among its most attractive features. However, whether or not these features would be useful for you depends on your use case. For example, the tooltip is not as useful if you are plotting data with larger sample sizes. Take a look at this plot of arrival vs. departure delays of flights headed to Los Angeles (LAX) in October 2013 from the various New York airports. The overplotting on the lower left of the plot makes the hovering functionality not that useful.library(nycflights13) oct_lax_flights <- flights %>% filter(month == 10, dest == "LAX") hchart(oct_lax_flights, "scatter", x = dep_delay, y = arr_delay, group = origin)
oct_lax_flights_agg <- oct_lax_flights %>% mutate(dep_delay_cat = cut(dep_delay, breaks = seq(-15, 255, 15))) %>% group_by(origin, dep_delay_cat) %>% summarise(med_arr_delay = median(arr_delay, na.rm = TRUE)) hchart(oct_lax_flights_agg, "line", x = dep_delay_cat, y = med_arr_delay, group = origin)
Summary
Highcharts provides high quality web graphics with high customizability, and the highcharter package allows R users to take full advantage of them. If you’re interested in finding out more about the functionality of the package, I highly recommend browsing the highcharter package homepagewhich contains a variety of Highcharts, Highstock and Highmaps plots along with sample code to reproduce them. Additionally, the Highcharts Options Reference page is immensely useful for finding the specific syntax for customization options.To leave a comment for the author, please follow the link and comment on their blog: RStudio.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.