Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The heatmap is a useful graphical tool in any data scientist's arsenal. It's a useful way of representing data that naturally aligns to numeric data in a 2-dimensional grid, where the value of each cell in the grid is represented by a color. It's a natural fit for data that's in a grid already (say, a correlation matrix). But it's also useful for data that can be arranged in a grid, like quantities in a calendar, as a way of comparing clusters, or simply as a combination of two categorical or discrete variables.
The base R heatmap function does a good job of generating basic heatmaps (this FlowingData tutorial showcases its capabilities), but if you want to put anything on the margins besides labels you're going to need something more powerful. The superheat package by Rebecca Barter (currently available only on GitHub) provides many additional capabilites for basic heatmaps (like ordering the rows/columns, or choosing a color scheme) and also the option to supercharge the heatmap with annotations and additional data visualizations in the margins. Here are a few examples:
Add a scatterplot (or boxplot) to the one of the margins (details and code):
Color the labels by another variable (here, Human Development Ranking, also represented as a bar chart — details and code here):
Or add dendrograms (perhaps from a clustering process) to the margins (details and code):
While the superheat pacakge uses the ggplot2 package internally, it doesn't itself follow the grammar of graphics paradigm: the function is more like a traditional base R graphics function with a couple of dozen options, and it creates a plot directly rather than returning a ggplot2 object that can be further customized. But as long as the options cover your heatmap needs (and that's likely), you should find it a useful tool next time you need to represent data on a grid.
The superheat package apparently works with any R version after 3.1 (and I can confirm it works on the most recent, R 3.3.2). This arXiv paper provides some details and several case studies, and you can find more examples here. Check out the vignette for detailed usage instructions, and download it from its GitHub repository linked below.
GitHub (rlbarter): superheat
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.