Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In this blog post I explore the question of how to color maps in a way that ensures optimal readability, while using a small set of colors. I also introduce the new MapColoring R package, which provides a practical solution to this problem in a user friendly manner.
The Problem
In my work, I often encounter situations where I need to plot a map of polygon features (such as countries, administrative units, or ethnic groups) simply to demonstrate the extent and quality of the spatial information. In other words, I don’t need to display any attribute data, but simply draw a (non-choropleth) map. How should I color the polygons? There are two easy, but often suboptimal options. First, I could use the same color for all features. This approach has the disadvantage that the individual regions are often difficult to tell apart, especially if there are islands or enclaves. Consider, for instance, the following map of Swiss cantons:
Clearly, it is very difficult to distinguish between small cantons and enclaves (of which there are some). Second, I could use a distinct color for all features. This makes distinguishing between areas easier, but often looks just terrible:
A more effective solution would be one where we use a small number of colors, but ensure all regions are clearly and easily identifiable. This leads to an interesting question: How do we color a map with as few colors as possible while ensuring that no two adjacent features are of the same color?
Map Coloring and Graph Theory
It turns out that this problem has a fairly long history. Wikipedia informs us that British cartographer Francis Guthrie described the issue in 1852 when mapping English counties, and proposed what is known as the Four-Color-Theorem. Essentially, the latter states that (under certain conditions, discussed below) any map can be colored according to the above-described rules with at most four colors. Interestingly, the theorem was only proven in 1976 by mathematicians Kenneth Appel and Wolfgang Haken.
An important aspect of map coloring is that it is a specific case of a more general problem called graph coloring. Here, the goal is to color all vertices of a graph such that no vertices sharing an edge have the same color. In this context, the minimal number of colors with which a graph can be colored is called its chromatic number.
Our map-related problem can easily be restated in terms of graph theory: every polygon feature is a vertex, and edges between vertices represent adjacent polygons on the map. If transformed in this manner, most maps can be represented as planar graphs, that is, graphs that can be drawn such that no edges cross each other. In fact, the Four-Color-Theorem introduced above applies to all planar graphs, and states that the chromatic number of any planar graph is at most four.
Does this mean that in practice, all we ever need to color a map are four colors? Unfortunately not. The transformation from maps to planar graphs only works under certain conditions. First, we need to define adjacency to mean that two regions share at least one border segment of positive length. If we assume that regions are adjacent if they share only a point (or corner), then maps don’t automatically generate planar graphs. To see that this is the case, note that a hypothetical map of five regions that all share a common border point would result in the complete graph of five vertices, which is not planar. Second, we have no guarantee of receiving a planar graph if the map we transform features enclaves. We may again invoke a counterexample to prove that this is the case. Assume a map of five regions, whereas each region has an exclave in every other region. Again, this results in the (non-planar) complete graph of five vertices.
As an illustration that real-world maps don’t necessarily translate to planar graphs, see the following depiction of Swiss cantons as a network. Here I use the “standard” adjacency definition whereas only shared border segments of positive length qualify as adjacent:
Visual inspection suggests that there are crossing edges in central Switzerland, owing to the fact that the territory of Obwalden is not contiguous.
It follows that in order to color a wide range of maps, we require a coloring method for arbitrary graphs. Unfortunately, there is no equivalent to the Four-Color-Theorem for arbitrary graphs. Indeed, finding an arbitrary graph’s chromatic number and coloring it accordingly is NP-hard. Fortunately, however, there are a number of fast algorithms that yield satisfactory (albeit sometimes suboptimal) solutions to the graph coloring problem.
The MapColoring R Package
In the following, I describe an R package I created that uses the greedy DSATUR graph coloring algorithm by Daniel Brélaz to assign colors to a collection of polygons. In addition to simply solving the map coloring problem, the package also allows to select colors in a manner that ensures maximal visual contrast between adjacent features. Specifically, if provided a vector of candidate colors (for example from a palette) of length N and a set of polygon features, the “getOptimalContrast()” function selects K<N colors, and assigns them to the polygons, such that (i) no two adjacent features share the same color, (ii) K is as small as possible, and (iii) the mean color contrast between adjacent polygons is as large as possible. Conditions (i) and (ii) are met by transforming the collection of polygons into an adjacency matrix describing a graph and then applying the DSATUR algorithm. The discrete optimization problem entailed by condition (iii) is solved using the CEoptim function in the package of the same name.
The new MapColoring package’s core functionality is contained in the three functions
- getNColors
- getColoring
- getOptimalContrast
To demonstrate these functions, I use the sp package to create a toy map in the shape of a chess board:
## Make chess board library(sp) gt <- GridTopology(c(0,0), c(1,1), c(8,8)) sg <- SpatialGrid(gt) board <- as(as(sg, "SpatialPixels"), "SpatialPolygons")
The getNColors function uses the greedy DSATUR algorithm to calculate the number of colors required to color the provided SpatialPolygons* object such that no two adjacent features have the same color. Note that if provided a SpatialPolygons* object, the getNColors function (as well as the getColoring and getOptimalContrast functions) construct an adjacency matrix based on the assumption that only features sharing at least one border segment of positive length are adjacent – thus, features sharing only a corner are ignored. If you’d like to adopt a different adjacency rule, you can directly pass a logical adjacency matrix (instead of a SpatialPolygons* object) to the MapColoring functions. In the next code segment, I pass the chess board, which is a SpatialPolygons object, to the getNColors function.
## Get estimate for chromatic number Ncol <- getNColors(board)
As we would expect, this returns a value of 2.
The getColoring function employs the greedy DSATUR algorithm to “color” the features of a SpatialPolygons* object (or an adjacency matrix). I write “color” because the function does not return actual colors, but an integer vector with a color index for each of the features in the provided map. Hence, if we pass the chess board to the getColoring function…
## Get coloring indices coloring <- getColoring(board)
… we get an integer vector of length 64 containing the values 1, 2, 1, 2, 1, etc.
Finally, as explained above, the getOptimalContrast function goes a step further, and instead of only assigning color indices, it returns actual colors. Provided a map and a vector of candidate colors, it selects and allocates these colors in a manner such that they match the indexing returned by the getColoring function, and ensures that the average contrast between adjacent polygons is maximized. Color contrasts are measured using their Contrast Ratio, as defined here.
In the following code segment, I pass the chess board together with 20 colors from R’s Rainbow palette to the getOptimalContrast function:
## Get Optimal contrast colors cand.colors <- rainbow(20) opt.colors <- getOptimalContrast(x=board, col=cand.colors) ## Plot plot(board, col=opt.colors)
The result looks as follows:
Finally, as a more realistic example, I use the getOptimalContrast function to plot a world map, obtained from the cshapes package. Moreover, instead of relying on the adjacency definition employed above, I define two countries to be adjacent if their borders are within 4 decimal degrees of each other. One reason one might want to do this is to avoid confusion over the status of islands. If the adjacency definition used above is adopted, an island and a proximate region on the mainland may be assigned the same color even though they do not belong to the same country because they do not share a border. Finally, as the set of candidate colors, I use a palette from the RColorBrewer package.
## Country borders in 2012 library(cshapes) world.spdf <- cshp(as.Date("2012-06-30")) ## Use loose definition of adjacent library(rgeos) adj.mat <- gIntersects(gBuffer(world.spdf, width=2, byid=TRUE), byid=TRUE) ## Get Optimal contrast colors library(RColorBrewer) cand.colors <- brewer.pal(11, "RdGy") opt.colors <- getOptimalContrast(x=adj.mat, col=cand.colors) ## Plot plot(world.spdf, col=opt.colors)
The result of this exercise looks like this:
For the time being, you can download the MapColoring R package from my ETH website, here. As soon as I find the time, I’ll upload it to my GitHub page.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.