Overcoming D3 Cartographic Envy With R + ggplot
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
When I used one of the Scotland TopoJSON files for a recent post, it really hit me just how much D3 cartography envy I had/have as an R user. Don’t get me wrong, I can conjure up D3 maps pretty well [1] [2] and the utility of an interactive map visualization goes without saying, but we can make great static maps in R without a great deal of effort, so I decided to replicate a few core examples from the D3 topojson gallery in R.
I chose five somewhat different examples, each focusing on various aspects of creating map layers and trying to not be to U.S. focused. Here they are (hit the main link to go to the gist for the example and the bl.ocks
URL to see it’s D3 counterpart):
- Swiss Cantons – R version of http://bl.ocks.org/mbostock/4207744
- Costa Rica – R version of http://bl.ocks.org/arce/9357998
- Area Choropleth – R version of http://bl.ocks.org/mbostock/4206573
- Block Counties – R version of http://bl.ocks.org/mbostock/4206857
- County Circles (OK, Ovals) – R version of http://bl.ocks.org/mbostock/4206975
I used the TopoJSON/GeoJSON files provided with each example, so you’ll need a recent gdal (>= 1.11), and—consequently—a suitable build of rgdal
) to work through the examples.
The Core Mapping Idiom
While the details may vary with each project you work on, the basic flow to present a map in R with ggplot
are:
- read in a map features (I use
readOGR
in these examples) - convert that into something
ggplot
can handle - identify values you wish to pair with those features (optional if we’re just plotting a plain map)
- determine which portion of the is to be displayed
- plot the map features
Words & abbreviations mean things, just like map symbols mean things, and if you’re wondering what this “OGR” is, here’s the answer from the official FAQ:
OGR used to stand for OpenGIS Simple Features Reference Implementation. However, since OGR is not fully compliant with the OpenGIS Simple Feature specification and is not approved as a reference implementation of the spec the name was changed to OGR Simple Features Library. The only meaning of OGR in this name is historical. OGR is also the prefix used everywhere in the source of the library for class names, filenames, etc.
The readOGR
function can work with a wide variety of file formats and OGR files can hold a wide variety of data. The most basic use for our mapping is to read in these TopoJSON/GeoJSON files and use the right features from them to make our maps. Features/layers can be almost anything (counties, states, countries, rivers, lakes, etc) and we can see what features we want to work with by using the ogrListLayers
function (you can do this from an operating system command line as well, but we’ll stay in R for now). Let’s take a look at the layers available in the map from the Costa Rica example:
ogrListLayers("division.json") ## [1] "limites" "provincias" "cantones" "distritos" attr(,"driver") ## [1] "GeoJSON" attr(,"nlayers") ## [1] 4 |
Those translate to “country”, “provinces”, “cantons”, & “districts”. Each layer has polygons and associated data for the polygons (and overall layer), including information about the type of projection. If you’re sensing a “math trigger warning”, fear not; I won’t be delving into to much more cartographic detail as you probably just want to see the maps & code.
Swiss Cantons
If you’re from the U.S. you (most likely) have no idea what a canton is. The quickest explanation is that it is an administrative division within a country and, in this specific example, the 26 cantons of Switzerland are the member states of the federal state of Switzerland.
The D3 Swiss Cantons uses a TopoJSON/GeoJSON file that has only one layer (i.e. the cantons) along with metadata about the canton id
and name
:
ogrInfo("readme-swiss.json", "cantons") ## Source: "readme-swiss.json", layer: "cantons" ## Driver: GeoJSON number of rows 26 ## Feature type: wkbPolygon with 2 dimensions ## Extent: (5.956 45.818) - (10.492 47.808) ## Number of fields: 2 ## name type length typeName ## 1 id 4 0 String ## 2 name 4 0 String |
NOTE: you should learn to get pretty adept with the OGR functions or command-line tools as you can do some really amazing things with them, including extracting only certain features, simplifying the polygons or fixing issues. Some of the TopoJSON/GeoJSON files you’ll find with D3 examples may have missing or invalid components and you can fix some of them with these tools. We’ll be working around errors and missing values in these examples.
The D3 example displays the canton name
at the centroid of the polygon, so that’s what we’ll do in R:
library(rgeos) library(rgdal) # needs gdal > 1.11.0 library(ggplot2) # ggplot map theme devtools::source_gist("https://gist.github.com/hrbrmstr/33baa3a79c5cfef0f6df") map = readOGR("readme-swiss.json", "cantons") map_df <- fortify(map) |
The map
object is a SpatialPolygonsDataFrame
and has a fairly complex structure:
slotNames(map) ## [1] "data" "polygons" "plotOrder" "bbox" ## [5] "proj4string" names(map) ## [1] "id" "name" # execute these on your own and poke around the data structures after determining the class class(map@data) class(map@polygons) class(map@plotOrder) class(map@bbox) class(map@proj4string) |
The fortify
function turns all that into something we can use with ggplot
. Normally, we’d be able to get fortify
to associate the canton name
to the polygon points it encodes via the region
parameter. That did not work with these TopoJSON/GeoJSON files and I didn’t really poke around much to determine why since it’s easy enough to work around. In this case, I manually merged the names with the fortified map data frame.
# create mapping for id # to name since "region=" won't work dat <- data.frame(id=0:(length(map@data$name)-1), canton=map@data$name) map_df <- merge(map_df, dat, by="id") |
We can get the centroid via the gCentroid
function, and we’ll make a data frame of those center points and the name of the canton for use with a geom_text
layer after plotting the base outlines of the cantons (with a rather bland fill, but I didn’t pick the color):
# find canton centers centers <- data.frame(gCentroid(map, byid=TRUE)) centers$canton <- dat$canton # make a map! gg <- ggplot() gg <- gg + geom_map(data=map_df, map=map_df, aes(map_id=id, x=long, y=lat, group=group), color="#ffffff", fill="#bbbbbb", size=0.25) # gg <- gg + geom_point(data=centers, aes(x=x, y=y)) gg <- gg + geom_text(data=centers, aes(label=canton, x=x, y=y), size=3) gg <- gg + coord_map() gg <- gg + labs(x="", y="", title="Swiss Cantons") gg <- gg + theme_map() |
The coord_map()
works with the mapproj
package to help us display maps in reasonable projections (or really dumb ones). The default is "mercator"
and we’ll stick with that since the D3 examples use it (but, winkel-tripel FTW!).
Here’s the result of our hard work (select map for larger version):
If you ignore the exposition above and just take into account non-blank source code lines, we did all that in ~16LOC and have a scaleable SVG file as a result. You can have some fun with the above code and remove the static fill="#bbbbbb"
and move it to the mapping
aesthetic parameter and tie it’s value to the canton name.
Costa Rica
The TopoJSON/GeoJSON file provided with the D3 example is a good example of encoding multiple layers into a single file (see the first ogrListLayers
above). We’ll create a fortified version of each layer and then plot each with a geom_map
layer using different line colors, sizes and fills:
limites = readOGR("division.json", "limites") provincias = readOGR("division.json", "provincias") cantones = readOGR("division.json", "cantones") distritos = readOGR("division.json", "distritos") limites_df <- fortify(limites) cantones_df <- fortify(cantones) distritos_df <- fortify(distritos) provincias_df <- fortify(provincias) gg <- ggplot() gg <- gg + geom_map(data=limites_df, map=limites_df, aes(map_id=id, x=long, y=lat, group=group), color="white", fill="#dddddd", size=0.25) gg <- gg + geom_map(data=cantones_df, map=cantones_df, aes(map_id=id, x=long, y=lat, group=group), color="red", fill="#ffffff00", size=0.2) gg <- gg + geom_map(data=distritos_df, map=distritos_df, aes(map_id=id, x=long, y=lat, group=group), color="#999999", fill="#ffffff00", size=0.1) gg <- gg + geom_map(data=provincias_df, map=provincias_df, aes(map_id=id, x=long, y=lat, group=group), color="black", fill="#ffffff00", size=0.33) gg <- gg + coord_map() gg <- gg + labs(x="", y="", title="Costa Rica TopoJSON") gg <- gg + theme_map() |
The result is pretty neat and virtually identical to the D3 version:
Try playing around with the order of the geom_map
layers (or remove some) and also the line color/size/fill & alpha values to see how it changes the map.
Area Choropleth
I’m not a huge fan of the colors used in the D3 version and I’m not going to spend any time moving Hawaii & Alaska around (that’s a whole different post). But, I will show how to make a similar area choropleth:
# read in the county borders map = readOGR("us.json", "counties") # calculate (well retrieve) the area since it's part of the polygon structure # and associate it with the polygon id so we can use it later. We need to do # the merge manually again since the "us.json" file threw errors again when # trying to use the fortify "region" parameter. map_area <- data.frame(id=0:(length(map@data$id)-1), area=sapply(slot(map, "polygons"), slot, "area") ) # read in the state borders states = readOGR("us.json", "states") states_df <- fortify(states) # create map data frame and merge area info map_df <- fortify(map) map_df <- merge(map_df, map_area, by="id") gg <- ggplot() # thin white border around counties and shades of yellow-green for area (log scale) gg <- gg + geom_map(data=map_df, map=map_df, aes(map_id=id, x=long, y=lat, group=group, fill=log1p(area)), color="white", size=0.05) # thick white border for states gg <- gg + geom_map(data=states_df, map=states_df, aes(map_id=id, x=long, y=lat, group=group), color="white", size=0.5, alpha=0) gg <- gg + scale_fill_continuous(low="#ccebc5", high="#084081") # US continental extents - not showing alaska & hawaii gg <- gg + xlim(-124.848974, -66.885444) gg <- gg + ylim(24.396308, 49.384358) gg <- gg + coord_map() gg <- gg + labs(x="", y="", title="Area Choropleth") gg <- gg + theme_map() gg <- gg + theme(legend.position="none") |
Play with the colors and use different values instead of the polygon area (perhaps use sample
or runif
to generate some data) to see how it changes the choropleth outcome.
Blocky Counties
The example from the D3 wiki is more “how to work with shapefiles and map coordinates” than it is useful, but we have the same flexibility in R, so we’ll make the same plot by using the bbox
function to make a data frame of bounding boxes we can use with geom_rect
(there’s no geom_map
in this example, just using the coordinate system to plot boxes):
# use the topojson from the bl.ocks example map = readOGR("us.json", "counties") # build our map data frame of rects map_df <- do.call("rbind", lapply(map@polygons, function(p) { b <- bbox(p) # get bounding box of polygon and put it into a form we can use later data.frame(xmin=b["x", "min"], xmax=b["x", "max"], ymin=b["y", "min"], ymax=b["y", "max"]) })) map_df$id <- map$id # add the id even though we aren't using it now gg <- ggplot(data=map_df) gg <- gg + geom_rect(aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax), color="steelblue", alpha=0, size=0.25) # continental us only gg <- gg + xlim(-124.848974, -66.885444) gg <- gg + ylim(24.396308, 49.384358) gg <- gg + coord_map() gg <- gg + labs(x="", y="", title="Blocky Counties") gg <- gg + theme_map() gg <- gg + theme(legend.position="none") |
To re-emphasize we’re just working with ggplot
layers, so play around and, perhaps color in only the odd numbered counties.
County Circles (OK, Ovals)
The last D3 example I’m copying swaps squares for circles, which makes this more of a challenge to do in R+ggplot since ggplot
has no “circle” geom (and holey geom_point
s do not count). So, we’ll borrow and slightly adapt a function from StackOverflow by joran that builds a data frame of polygon points derived by a center & diameter. We’ll add an id
value (for each of the counties) and make one really big data frame (well, big for use in ggplot
) that we can then plot as grouped geom_path
s. Unlike our cantons example, the gCentroid
function coughed up errors on this TopoJSON/GeoJSON file, so I resorted to approximating the center from the rectangular bounding box. Also, I don’t project the circle coordinates before plotting, so they’re ovals. While it doesn’t mirror the D3 example perfectly, it should help reinforce how to work with the map’s metadata and draw arbitrary components on a map:
# adapted from http://stackoverflow.com/questions/6862742/draw-a-circle-with-ggplot2 # computes a circle from a given diameter. we add "id" so we can have one big # data frame and group them for plotting circleFun <- function(id, center = c(0,0),diameter = 1, npoints = 100){ r = diameter / 2 tt <- seq(0,2*pi,length.out = npoints) xx <- center[1] + r * cos(tt) yy <- center[2] + r * sin(tt) return(data.frame(id=id, x = xx, y = yy)) } # us topojson from the bl.ocks example map = readOGR("us.json", "counties") # this topojson file gives rgeos_getcentroid errors here # so we approximate the centroid map_df <- do.call("rbind", lapply(map@polygons, function(p) { b <- bbox(p) data.frame(x=b["x", "min"] + ((b["x", "max"] - b["x", "min"]) / 2), y=b["y", "min"] + ((b["y", "max"] - b["y", "min"]) / 2)) })) # get area & diameter map_df$area <- sapply(slot(map, "polygons"), slot, "area") map_df$diameter <- sqrt(map_df$area / pi) * 2 # make our big data frame of circles circles <- do.call("rbind", lapply(1:nrow(map_df), function(i) { circleFun(i, c(map_df[i,]$x, map_df[i,]$y), map_df[i,]$diameter) })) gg <- ggplot(data=circles, aes(x=x, y=y, group=id)) gg <- gg + geom_path(color="steelblue", size=0.25) # continental us gg <- gg + xlim(-124.848974, -66.885444) gg <- gg + ylim(24.396308, 49.384358) gg <- gg + coord_map() gg <- gg + labs(x="", y="", title="County Circles (OK, Ovals)") gg <- gg + theme_map() gg <- gg + theme(legend.position="none") |
If you poke around a bit at the various map libraries in R, you should be able to figure out how to get those plotted as circles (and learn alot in the process).
Wrapping Up
R ggplot
maps won’t and shouldn’t replace D3 maps for many reasons, paramount of which is interactivity. The generated SVG files are also fairly large and the non-SVG versions don’t look nearly as crisp (and aren’t as flexible). However, this should be a decent introductory primer on mapping and shapefiles and might come in handy when you want to use R to enhance maps with other data and write out (yep, R can read and write OGR) your own shapefiles for use in D3 (or other tools/languages).
Don’t forget that all source code (including TopoJSON/GeoJSON files and sample SVGs) are in their own gists:
If you figure out what is causing some of the errors I mentioned or make some epic maps of your own, don’t hesitate to drop a note in the comments.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.