Mapping with ggplot: Create a nice choropleth map in R
Posted on October 27, 2015 by Slawa Rokicki in R bloggers | 0 Comments
[This article was first published on R for Public Health, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I was working on making a map in R recently and after an extensive search online, I found a hundred different ways to do it and yet each way didn’t work quite right for my data and what I wanted to do. Eventually, I found a way to make it work after pulling together code from [this blog](http://blog.revolutionanalytics.com/2009/11/choropleth-challenge-result.html ), [this blog](http://www.kevjohnson.org/making-maps-in-r/ ), and [this stack overflow page](http://stackoverflow.com/questions/11052544/convert-map-data-to-data-frame-using-fortify-ggplot2-for-spatial-objects-in-r ). Combining all of those sources, along with a good amount of trial and error, has led to this blog post. This post shows how to use ggplot to map a chloropleth map from shapefiles, as well as change legend attributes and scale, color, and titles, and finally export the map into an image file. ### Preparing the data We need a number of packages to make this work, as you can see below so make sure to install and load those. We load in shapefiles using the **readShapeSpatial()** function from the maptools library. A shapefile is a geospatial vector data storage format for storing map attributes like latitude and longitude. You can download lots of shape files for free! For example, for any country in the world you can download shape files [here](http://www.diva-gis.org/gdata). Or just google it. My project is to graph some data on India so I have downloaded a shapefile with state boundaries of India and I will now load that into R. library(ggplot2) library(maptools) library(rgeos) library(Cairo) library(ggmap) library(scales) library(RColorBrewer) set.seed(8000) ##input shapefiles setwd(“/Users/Analysis/Shapefiles/”) The class of the loaded shapefiles are of a spatial class, and we can see what is in it by checking its names. You can print out these components to see what they are. For example, let’s print out the state names like this: print(states.shp$NAME_1) Now we’ll get the data we want to plot on the map. Here I’ll just make the data up, but you can import a csv of it or extract it from your model estimates. The important thing is that the id numbers are the same in the shape file as in your data you want to plot, because that will be necessary for them to merge properly (you could also merge by the name itself). ##create (or input) data to plot on map Now we need to merge the shapefile and the dataset together. First, we fortify() the shapefile (a function of ggplot) to get it into a dataframe. We need to include a region identifier so that ggplot keeps that id when it turns it into a dataframe; otherwise it will make it up and it will not be possible to merge properly. We can see that the class is now a common dataframe and each row is a longitude and latitude. Now we can merge the two dataframes together by the id variable, making sure to keep all the observations in the spatial dataframe. Importantly, we need to order the data by the “order” variable. You can see what happens if you don’t do this and it’s not pretty. Basic plot We’re ready to plot the data using ggplot(), along with geom_polygon() and coord_map(). Note that in the aes statement, group=group does NOT change; “group” is a variable in the fortified dataframe so just leave it. You do need to change “prevalence” to whatever variable you want to plot. ggplot() + geom_polygon(data = final.plot, aes(x = long, y = lat, group = group, fill = prevalence), color = “black”, size = 0.25) + coord_map() This is a great start, but there are a number of options that can make the map look much nicer. First, we can remove the background and gridlines by including **theme_nothing()** from the ggmap package, but indicating that we still want the legend. We can also change the title of the legend and add in a title for the whole map. Next, we can change the palette to “YlGn” which is Yellow to Green. You can find palette options as well as help on color in R [here](https://www.nceas.ucsb.edu/~frazier/RSpatialGuides/colorPaletteCheatsheet.pdf), or type in display.brewer.all() to see a plot of them in R. We can also use **pretty_breaks()** from the scales package to get nice break points (the pretty algorithm creates a sequence of about n+1 equally spaced round values, that are 1, 2, or 5 times a power of 10). If you don’t like the way pretty_breaks looks, you can change your scale with scale_fill_gradient() and set the limits and colors like this: scale_fill_gradient(name=”My var”, limits=c(0,100), low=”white”, high=”red”) I do this in the third map, below. More about using color gradients in R, [here](http://docs.ggplot2.org/current/scale_gradient.html). #nicer plot ggplot() + geom_polygon(data = final.plot, aes(x = long, y = lat, group = group, fill = prevalence), color = “black”, size = 0.25) + coord_map()+ scale_fill_distiller(name=”Percent”, palette = “YlGn”, breaks = pretty_breaks(n = 5))+ theme_nothing(legend = TRUE)+ labs(title=”Prevalence of X in India”) Other options: If we wanted to reverse the color to go from 0 being dark instead of light, we can just add in the option trans=”reverse” into the scale_fill_distiller statement (see the plot below). If we were plotting a discrete variable instead of a continuous one, we would use scale_fill_manual instead of distiller, like this: scale_fill_manual(name=”My discrete variable”, values=c(“red”, “blue”, “green”, “yellow”)) Additionally, we can put the names of the states on the map. This may or may not be a great idea depending on the size of the areas you’re mapping, but if it’s useful, you can do it easily using __geom_text()__. First we need to identify a longitude and latitude where you want to plot each state name. One way we can do this is using **aggregate()** to calculate the mean of the range of the longitude and latitude for each state. #aggregate data to get mean latitude and mean longitude for each state ggplot() + geom_polygon(data = final.plot, aes(x = long, y = lat, group = group, fill = prevalence), color = “black”, size = 0.25) + coord_map() + scale_fill_gradient(name=”Percent”, limits=c(0,70), low=”white”, high=”red”)+ theme_nothing(legend = TRUE)+ labs(title=”Prevalence of X in India”)+ geom_text(data=cnames, aes(long, lat, label = NAME_1), size=3, fontface=”bold”) This doesn’t look great on this particular map, but it may be useful in other settings. You could also use geom_text() to put other text on the map to identify certain features or highlight an area. Exporting map Finally, one way we can export the map is by using Cairo. For this to work on my mac, I had to install X11, found [here](http://xquartz.macosforge.org/landing/). We save the plot in an object and then use ggsave() to export the object as a png. First save the ggplot statement in an object, and then export with ggsave, indicating the width and height. You can also specify the dpi (default is 300). Another way to save to pdf is via the pdf function, also shown below. #save map ggsave(p2, file = “map1.png”, width = 6, height = 4.5, type = “cairo-png”) #another way to save pdf(“mymap.pdf”) print(p2) dev.off()
Related
To leave a comment for the author, please follow the link and comment on their blog: R for Public Health.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.