Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Color is often used to display an extra dimension in plots of scientific data. Unfortunately, everyone does not decode color in exactly the same way. This is especially true for those with color vision deficiency, which affects up to 8 percent of the population in its 2 most common forms. As a result, it has been estimated that the odds of a given plot reaching a reviewer with some form of color vision deficiency in a group of three males is approximately 22%. Hopefully, when we are creating figures, this number alone is compelling enough to always keep these viewers in mind. The truth, however, is that your figures aren‘t only seen by reviewers: they are seen by a much wider group that includes readers of your paper, members of the audience when you present your work, viewers of your lab‘s website, and potentially many others. As your audience grows, your choices in color become more and more important for effectively communicating your work.
Although there are many outstanding tools for creating beautiful plots, practically all of them have default color palettes that can present decoding challenges for individuals with color vision deficiencies. This is an introduction to creating plots and figures using color palettes that are more accessible. For the examples below, I use the excellent ggplot2 library for R. The same ideas and colors can easily be transferred to your particular tool of choice.
Using Color to Represent Categorical Data
When using color to encode categorical data, such as blood type, gender, or strain of a bacteria, it is important to choose a color palette that has as many easily-differentiable colors as there are categories. The figure below shows one palette that can encode up to 8 values, and simulates how each of its colors is seen by someone with protanopia, deuteranopia, and tritanopia.
With ggplot2, the color palette for categorical data can be set using scale_color_manual
(for points, lines, and outlines) and scale_fill_manual
(for boxes, bars, and ribbons). The argument to either of these commands is a vector of colors, which can be defined by hex RGB triplet or by name. As an example, let’s take a look at the relationship between the weight and the corresponding price of diamonds in ggplot2′s included diamonds data set. We can use color to indicate the quality of the cut. Note that this data set is quite large, so this scatter plot might not be the most informative way to display these data.
ggplot(diamonds, aes(x=carat, y=price, color=cut)) + geom_point()
scale_color_manual
sets the color of the first category (chosen alphabetically in R unless an ordering is specified) using the first color given, the second category with the second color, and so on. Using the colors from the colorblind-safe palette shown above:
ggplot(diamonds, aes(x=carat, y=price, color=cut)) + geom_point() + scale_color_manual(values=c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7"))
Otherwise, if you don’t want to have to remember the ordering of your categories, or if you want to apply specific colors to each category, you can manually define the color of each:
ggplot(diamonds, aes(x=carat, y=price, color=cut)) + geom_point() + scale_color_manual(values=c("Fair"="#E69F00", "Good"="#56B4E9", "Premium"="#009E73", "Ideal"="#F0E442", "Very Good"="#0072B2"))
Redundant Encodings
When describing a figure, it is a common tendency to refer to a specific color. Hopefully, you’re at least now convinced that not everyone sees color the same way, especially when using a standard red, green, blue color palette. It is also very common for figures to be printed in black and white or your printer to be low on magenta ink. To improve legibility when your figures aren’t reproduced exactly as created, consider using redundant encodings. As an example, we can use both shapes and colors to refer to categories:
ggplot(diamonds, aes(x=carat, y=price, color=cut, shape=cut)) + geom_point() + scale_color_manual(values=c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7"))
The use of redundant encoding can also aid in figure captions, where referring to a category as “the blue squares” is helpful both for those with color vision deficiencies, and for those with printer troubles (all of us?). However, if the data can be represented with symbols equally as well as with colors, this does beg the question that should always be asked: Are colors are absolutely necessary?
Using ColorBrewer Palettes
No discussion on color palettes would be complete without mentioning Cynthia Brewer‘s ColorBrewer, an excellent source for color palettes that includes both colorblind-safe and print-friendly palettes.
scale_color_brewer
command.
ggplot(diamonds, aes(x=carat, y=price, color=cut)) + geom_point() + scale_color_brewer(palette="Dark2")
Using Color to Represent Continuous Values
When using color to represent continuous values, special care should be taken to ensure not only that colors chosen are differentiable, but also that viewers interpret changes in value of a given magnitude similarly throughout the spectrum. The rainbow color map, which is the default in many graphics packages, does not do this well. Color palettes that use variations, not only in hue, but also in saturation and lightness, can produce more linear changes in perception.
Of course, color gradients can introduce additional problems for viewers with color vision deficiencies when certain areas of the spectrum are included. For these viewers, colors that vary uniformly in lightness, which is how the greyscale palette is made, are most accessible. Again, always ask yourself if the use of color conveys information that could be encoded in another way.
ggplot2 includes a number of functions for making continuous color scales such as scale_color_gradient
, scale_color_continuous
, and scale_color_grey
. To demonstrate, I’ll switch to the mtcars data set, which contains, among other things, fuel economy for 32 cars manufactured in 1973-1974.
# Example borrowed from the geom_tile documentation ggplot(mtcars, aes(y=factor(cyl), x=mpg)) + stat_density(aes(fill=..density..), geom="tile", position="identity")
Fortunately, ggplot2 does a nice job in displaying continuous values with color by default. Otherwise, we can use the RColorBrewer package to fetch palettes from ColorBrewer (the “PuBuGn” palette in this case), and apply them using the scale_color_gradientn
command:
ggplot(mtcars, aes(y=factor(cyl), x=mpg)) + stat_density(aes(fill=..density..), geom="tile", position="identity") + scale_fill_gradientn(colours=brewer.pal(n=8, name="PuBuGn"))
Further Reading
- Robert Simmon’s Subtleties of Color (part 1, 2, 3, 4, 5, 6)
- Bang Wong’s “Color blindness” in Nature Methods, part of the excellent Points of View column
- Okabe and Ito’s “Color Universal Design (CUD) – How to make figures and presentations that are friendly to Colorblind people“
- Coblis color blind simulator
- Borland, D.; Taylor, R.M., “Rainbow Color Map (Still) Considered Harmful“, Computer Graphics and Applications, IEEE , vol.27, no.2, pp.14,17, March-April 2007.
- Cookbook for R entry for using colors in ggplot2
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.