Hacking Highcharter: observations per group in boxplots
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Highcharts has long been a favourite visualisation library of mine, and I’ve written before about Highcharter, my preferred way to use Highcharts in R.
Highcharter has a nice simple function, hcboxplot()
, to generate boxplots. I recently generated some for a project at work and was asked: can we see how many observations make up the distribution for each category? This is a common issue with boxplots and there are a few solutions such as: overlay the box on a jitter plot to get some idea of the number of points, or try a violin plot, or a so-called bee-swarm plot. In Highcharts, I figured there should be a method to get the number of observations, which could then be displayed in a tool-tip on mouse-over.
There wasn’t, so I wrote one like this.
First, you’ll need to install highcharter
from Github to make it work with the latest dplyr
.
Next, we generate a reproducible dataset using the wakefield
package. For some reason, we want to look at age by gender, but only for redheads:
library(dplyr) library(tidyr) library(highcharter) library(wakefield) library(tibble) set.seed(1001) sample_data <- r_data_frame( n = 1000, age(x = 10:90), gender, hair ) %>% filter(hair == "Red") sample_data %>% count(Gender) ## # A tibble: 2 x 2 ## Gender n ## <fctr> <int> ## 1 Male 62 ## 2 Female 48
Giving us 62 male and 48 female redheads. The tibble
package is required because later on, our boxplot function calls the function has_name
from that package.
The standard hcboxplot
function shows us, on mouse-over, the summary data used in the boxplot, as in the image below.
hcboxplot(x = sample_data$Age, var = sample_data$Gender) %>% hc_chart(type = "column")
To replace that with number of observations per group, we need to edit the function. In RStudio, View(hcboxplot)
will open a tab with the (read-only) code, which can be copy/pasted and edited. Look for the function named get_box_values
, which uses the R boxplot.stats
function to generated a data frame:
get_box_values <- function(x) { boxplot.stats(x)$stats %>% t() %>% as.data.frame() %>% setNames(c("low", "q1", "median", "q3", "high")) }
Edit it to look like this – the new function just adds a column obs
with number of observations:
get_box_values <- function(x) { boxplot.stats(x)$stats %>% t() %>% cbind(boxplot.stats(x)$n) %>% as.data.frame() %>% setNames(c("low", "q1", "median", "q3", "high", "obs")) }
Save the new function as, for example, my_hcboxplot
. Now we can customise the tooltip to use the obs
property of the point
object:
my_hcboxplot(x = sample_data$Age, var = sample_data$Gender) %>% hc_chart(type = "column") %>% hc_tooltip(pointFormat = 'n = {point.obs}')
Voilà.
Filed under: R, statistics
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.