Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
So, I (unapologetically) did this to @Highcharts last week:
@hrbrmstr Your loss of words inspired this post!! https://t.co/3KO0BP0k0u @hadleywickham @ma_salmon @tdmv @bearloga @rushworth_a @awhstin
— Highcharts (@Highcharts) March 18, 2016
They did an awesome makeover (it’s interactive if you follow the link):
And, I’m not kidding, it’s actually a really good treemap. Not too many hierarchies or discrete categories. But, it’s still hard for humans to compare things without the aid of the interaction (which is totally fair, the Highcharts folks do interaction well). I always try to find an alternative to treemaps, usually through trying to figure out the story to tell. I think there’s at least one story in the Highcharts data that we can uncover with a different visualization. Ironically, the visualization I’ve chosen is a stacked bar chart (I don’t generally like them, either). I’ll frame the story and then dissect the code.
We looked at the number of frameworks being used with Highcharts across web-oriented programming languages. Surprisingly, four of the six top languages—Java, PHP, Python & dotNet—show Highcharts being used *without* an associated framework, which highlights the flexible nature of Highcharts. There seems to be—unsurprisingly—only one player in town when it comes to Ruby: Ruby on Rails, and the high prevalence of AngularJS tracks with Angular’s apparent dominance in javascript land. INSERT_MARKETING_LANGAUGE_HERE
In real life, I’d add a DataTables interactive table with this to let folks explore a bit more.
Making this in R & ggplot2
Highcharts used a Google Sheet to hold the data for their treemap makeover. That means we can have some fun with it in R. So, the two main story points are:
- show how the languages, and in-language frameworks rank against each other
- show the dominant framework in each language
As demonstrated, I’ve chosen to use stacked bar charts since there only six languages and it turns out there is a dominant category for each.
A design criteria I made was to use the main or alternate color for each language and use a gradient to segment each in-language framework. I chose the yellow alternate color for Python since it’s such cowardly language there was enough blue in the chart already. Java & Ruby are separated enough that their slightly different reds aren’t too bad/confusing (and neither language left me with much of an alternative). I picked a green from the Mozilla palette for JavaScript since they seem to dominate any Google search for JavaScript info.
Let’s get libraries out of the way. I’m using my personal theme since I really don’t feel like typing everything out. If you need me to, drop a note and I’ll see what I can do.
library(googlesheets) # get the data library(dplyr) # reshape the data library(ggplot2) # plot library(hrbrmisc) # theme library(scales) # plot helpers |
First, we need the data, and that’s where @jennybryan’s excellent googlesheets
package comes into play:
sheet <- gs_key("1wYm5waQmiYKGhtdofvXDS8SHdh72Mwcnygvf3bvFfoU") langs <- gs_read(sheet) langs <- langs[-(1:6), 2:4] |
We need to be able to order the programming languages by # of frameworks and we need the colors defined:
tops <- count(langs, parent, wt=value) parent_cols <- c(Java="#960000", PHP="#8892bf", Python="#ffdc51", JavaScript="#70ab2d", dotNet="#68217a", Ruby="#af1401") |
To get bars and stacked segments sorted the right way, we need to add a helper column and arrange the overall data frame:
langs <- arrange(ungroup(mutate(group_by(langs, parent), rank=rank(value))), -rank) |
Next, we need to assign colors per language and in-language framework, I do this by computing an ordered alpha value for each framework dependent on the number of frameworks in the language:
langs <- mutate(group_by(langs, parent), color=alpha(parent_cols[parent[1]], seq(1, 0.3, length.out=n())))> |
Finally we need the actual languages in factor order for ggplot
:
langs$parent <- factor(langs$parent, levels=arrange(tops, n)$parent) |
We also need the dominant frameworks separated out so we can annotate them. Extra marks for ensuring they’re readable (black vs white depending on the base color):
top_f <- slice(group_by(langs, parent), 1) top_f$color <- c("white", "white", "#2b2b2b", "#2b2b2b", "white", "white") |
With the data in the right format, the actual ggplot
code isn’t too cumbersome:
gg <- ggplot() # stack the bars. the bars themselvs will be ordered by the language factor and our # computed rank will stack them in the right order. we'll use an identify fill for # the mapped fill aesthetic gg <- gg + geom_bar(data=langs, stat="identity", aes(x=parent, y=value, fill=color, order=rank), color="white", size=0.15, width=0.65) # text labels at the end of the bar means no need for any extra chart junk gg <- gg + geom_text(data=tops, family="NoyhSlim-Medium", aes(x=parent, y=n, label=n), hjust=-0.2, size=3) # here's how we label the dominant framework gg <- gg + geom_text(data=top_f, family="NoyhSlim-Medium", aes(x=parent, y=value/2, label=id, color=color), hjust=0.5, size=3) # we'll control our own panel breathing room, thanks anyway, ggplot2 gg <- gg + scale_x_discrete(expand=c(0,0)) gg <- gg + scale_y_continuous(expand=c(0,0), limits=c(0, 900)) # these tell ggplot to use the color we've specified vs map it to a scale gg <- gg + scale_color_identity() gg <- gg + scale_fill_identity() # the rest doesn't need 'splainin gg <- gg + coord_flip() gg <- gg + labs(x=NULL, y=NULL, title="Popular web frameworks using Highcharts", subtitle="Total usage by language, including the most popular framework in-language", caption="Data graciously provided by Highcharts - http://jsfiddle.net/vidarbrekke/n6pd4jfo/") gg <- gg + theme_hrbrmstr(grid=FALSE, axis="y") gg <- gg + theme(legend.position="none") gg <- gg + theme(axis.text.x=element_blank()) gg |
If I wanted to kill more time, I’d’ve used the language logo vs the name in the axis.
Fin
What story/stories can you glean from the data and how would you tell them? Drop a note in the comments with your creation(s)!
Complete, contiguous code is in this gist.
Note that stacked bars aren’t always a replacement for treemaps and that treemaps do have valid uses. The important part is to choose the visualization that best supports the story you want to tell.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.