Maps in R: choropleth maps
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This is the third article of the Maps in R series. After having shown how to draw a map without placing data on it and how to plot point data on a map, in this installment the creation of a choropleth map will be presented.
A choropleth map is a thematic map featuring regions colored or shaded according to the value assumed by the variable of interest in that particular region.
Packages used
We make use of three packages (plus their dependencies) to produce the maps in this post.
maptools
: a set of tools for reading and handling spatial objects. In particular, it will be used to read .shp files, a common format used by geographic information systems software.ggplot2
: one of the powerful graphics engines available to R usersggmap
: a package for spatial visualization using popular on-line mapping systems, such as GoogleMaps or OpenStreetMap.
So, let’s load them into R.
> library(maptools) > library(ggplot2) > library(ggmap) |
Data used
In order to produce the maps displayed later in this post, we used data publicly available from Eurostat.
The polygons for drawing the administrative boundaries were obtained from this link. In particular, the NUTS 2010 shapefile in the 1:60 million scale was downloaded and used. The other available scales would allow the drawing of better defined maps, but at a computational cost. The zipped file has to be extracted in a folder of choice for using it later.
Note: for an introduction on NUTS (Nomenclature of territorial units for statistics) classification, look at this Introduction on Eurostat website.
The statistic that will be drawn in the map is the the public expenditure on education as a percentage of the Gross Domestic Product (year 2009), and is one of the many indicators available for download at Eurostat’s statistics portal.
In order to make the following code fully reproducible, I’m uploading below the .csv file I obtained after performing the various selections on the Eurostat portal.
csv file: educ_thexp_1_Data
With the following commands we read both the shapefile and the csv in R.
# read administrative boundaries (change folder appropriately) > eurMap eurEdueurEdu$Value |
Data preparation
The eurMap
object could potentially be used as is to produce a map. In fact, a command as simple as plot(eurMap)
would draw the map of Europe at the NUTS-3 level.
However, since we are going to make use of the great versatility of the ggplot2
graphics facility, we need to transform eurMap
to a data frame, as that is the only type of object accepted by the ggplot()
function. The ggplot2
package has the fortify()
function that takes care of said conversion.
# convert shp data to data frame for ggplot > eurMapDf |
After that, we can add the data on education expenditure to the newly created data frame. Note that, since administrative borders will be plotted as paths, it’s important to have them ordered as they have to be drawn.
# merge map and data > eurEduMapDfeurEduMapDf |
Finally, to avoid the plotting of European territories far away from mainland Europe, we put some coordinates limits to the data, using the geocode()
function we introduced in previous posts.
# limit data to main Europe > europe.limits eurEduMapDf min(europe.limits$lon) & long < max(europe.limits$lon) & lat > min(europe.limits$lat) & lat < max(europe.limits$lat)) |
Mapping with ggplot2
While not specifically designed for the purpose, the ggplot2
package is well suited for the display of maps. Let’s first create an empty map (just showing borders,) by adding a geom_path
layer to a ggplot()
call.
# ggplot mapping # data layer > m0m1 |
The map in the following image is obtained by typing m1
in the R console.
Then, by adding a geom_polygon
layer, we color the country area according to the education expenditure value (stored in the Value
column.)
# fill with education expenditure data > m2 |
And here is the result of typing m2
in the R console.
The country borders are not visible anymore because the filled polygons have been stacked on top of them. Changing the order of layers will produce a more readable map.
# inverse order (to have visible borders) > m0 m1 m2 m2 |
Overlay on GoogleMaps
Polygon and path layers can be overlaid on a map obtained from GoogleMaps (or OpenStreetMaps) using the ggmap
package introduced in the previous posts.
It must be noted that the overlaying will work only if the boundary files have the same coordinate system and the same projection of the underlying map. In this case, the combination of Eurostats shapefiles and GoogleMaps works fine.
# over a GoogleMap (not working if not correctly projected) > map m0 m1m2 |
As a final touch, let’s put text on the map. Using the summaryBy()
funcion of the doBy
package, we calculate average values for multiple variables. The solution of using the average coordinates of all boundary points as the position where to place the text is a lazy one, as appropriate ways to calculate the center of mass of a polygon should be used instead.
# add text > library(doBy) > txtValm3 |
Where to find Shapefiles
If you are looking for shapefiles, your best choice is probably googling for them. However, here are a few places where you can find them.
- The Eurostat page on Administrative and statistical units mentioned at the beginning of this post.
- ISTAT’s cartography page for Italian administrative units. Also contains geocoded data points of ports, police stations and many other things.
- The Global Administrative Areas website has worldwide boundaries at various levels in several formats.
- CShapes features historical boundaries going back to 1946, thus useful for displaying older data. They can be downloaded as shapefiles, but
chsapes
also comes as an R package.
View (and download) the full code:
library(maptools) library(ggplot2) library(ggmap) # read administrative boundaries (change folder appropriately) eurMap <- readShapePoly(fn="NUTS_2010_60M_SH/Shape/data/NUTS_RG_60M_2010") # read downloaded data (change folder appropriately) eurEdu <- read.csv("educ_thexp_1_Data.csv", stringsAsFactors = F) eurEdu$Value <- as.double(eurEdu$Value) #format as numeric # merge map and data eurEduMapDf <- merge(eurMapDf, eurEdu, by.x="id", by.y="GEO") eurEduMapDf <- eurEduMapDf[order(eurEduMapDf$order),] #limit data to main Europe europe.limits <- geocode(c("Cape Fligely, Rudolf Island, Franz Josef Land, Russia", "Gavdos, Greece", "Faja Grande, Azores", "Severny Island, Novaya Zemlya, Russia")) eurEduMapDf <- subset(eurEduMapDf, long > min(europe.limits$lon) & long < max(europe.limits$lon) & lat > min(europe.limits$lat) & lat < max(europe.limits$lat)) # ggplot mapping # data layer m0 <- ggplot(data=eurEduMapDf) # empty map (only borders) m1 <- m0 + geom_path(aes(x=long, y=lat, group=group), color='gray') + coord_equal() # fill with education expenditure data m2 <- m1 + geom_polygon(aes(x=long, y=lat, group=group, fill=Value)) # inverse order (to have visible borders) m0 <- ggplot(data=eurEduMapDf) m1 <- m0 + geom_polygon(aes(x=long, y=lat, group=group, fill=Value)) + coord_equal() m2 <- m1 + geom_path(aes(x=long, y=lat, group=group), color='black') m2 # over a GoogleMap (not working if not correctly projected) map <- get_map(location = 'Europe', zoom=4) m0 <- ggmap(map) m1 <- m0 + geom_polygon(aes(x=long, y=lat, group=group, fill=Value), data=eurEduMapDf, alpha=.9) m2 <- m1 + geom_path(aes(x=long, y=lat, group=group), data=eurEduMapDf, color='black') # add text library(doBy) txtVal <- summaryBy(long + lat + Value ~ id, data=eurEduMapDf, FUN=mean, keep.names=T) m3 <- m2 + geom_text(aes(x=long, y=lat, label=Value), data=txtVal, col="yellow", cex=3) |
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.