Choosing colour palettes. Part II: Educated Choices.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
There are many resources on the use of colours in R, several packages, and a number of schemes already implemented in ggplot2
. In the previous part, we saw how ggplot2
selects a default colour palette according to the type of variable, discrete or continuous. There are further options, illustrated below:
Choosing colours for a graphic is always some kind of a compromise. One one hand, you want the computer, some algorithm, to choose a sensible colour scheme and pick automatically the required number of colours from this scale. On the other hand, there are always external human preferences that constrain the choices, and are not always easy to formalise.
Some choices, even prevalent in the literature such as the rainbow color scale (also known as Matlab’s flashy colorjet),
are just not good enough. They introduce artefacts, highlight regions of the data that should have a smooth transition with their surroundings, and do not degrade gracefully in black-and-white print, or when viewed by colour-impaired people.
If good colours for scientific graphics are not in the eye of the beholder, what are the guides to make the best choice?
A recent blog post illustrates the search for a pleasing colour scheme in bar graphs. On the default HCL (Hue Chroma Luminance, pdf) choice of ggplot2
for discrete variables, the author remarks
The colour choice is not a bad one, but there’s something about the intensity of the colours that makes me want to find a new set of colours somewhat more soothing to my eyes.
and documents his heuristic search for satisfying colours,
I shuffled through many different colours on the Color Hex website, and nothing else seemed to work with me as I wasn’t selecting colours based on any theory
A good discussion is offered in the colorspace package and its accompanying vignettes and papers, e.g. Escaping RGBland: Selecting Colors for Statistical Graphics (pdf)
Despite this omnipresence of color, there is often only little guidance in statistical software packages on how to choose a palette appropriate for a particular visualization task
In this instance, I would argue that the hcl
colour scale of ggplot2
is a good start for a well-balanced graphic that doesn’t draw the attention to a particular colour. If the colours are too flashy in bar plots (large areas), the saturation and luminosity can easily be muted by tuning the scale,
Bar plots and maps can also benefit from trying a few different colour palettes from the excellent ColorBrewer website. An interface is provided in R
and ggplot2
through the RColorBrewer
package.
The RColorBrewer package
Easily accessed with scale_colour_brewer()
, it is trivial to choose among 35 palettes (see RColorBrewer::display.brewer.all()
).
Sequential palettes, suited to ordered data that progress from low to high. Lightness steps dominate the look of these schemes, with light colors for low data values to dark colors for high data values.
Qualitative palettes, do not imply magnitude differences between legend classes, and hues are used to create the primary visual differences between classes. Qualitative schemes are best suited to representing nominal or categorical data.
Diverging palettes, put equal emphasis on mid-range critical values and extremes at both ends of the data range. The critical class or break in the middle of the legend is emphasized with light colors and low and high extremes are emphasized with dark colors that have contrasting hues.
In the next post, we’ll look at some special cases where the user might want finer control over these scales, or define completely new colour palettes tailored for a specific graphic.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.