Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
It’s exciting when great people help each other get things done
This is a simple networking story, which not be surprising to some but I was happily surprised by it. This is how the story goes:
Two weeks ago rMaps
(Vaidyanathan, 2014) was released. After making a blog post about it I thought about using it to make a map of the homicide rate in Mexico over the recent years. First, I had the question of how to make custom maps with rMaps
. @tyokota had the same question and started asking Ramnath about it in rMaps issue 6. Then I realized I needed a specific file with the map information. Google lead me to @diegovalle who has created the map from official Mexican sources, downloaded the homicide data, cleaned it, and made several maps and analyses: all his work is very impressive! I thought that it’d be very cool if @diegovalle and Ramnath connected, and they did! I saw them interacting via Twitter (here and here) and via GitHub. After sharing @diegovalle‘s work with my friends, it turned out that some old friends already knew him (here and high school friends). Another friend was interested in additional features and I suggested her to contact @diegovalle via Twitter: he quickly replied as you can see here.
Beyond how impressive rMaps
and @diegovalle‘s work on mexican data are, I was amazed by the willingness to help each other and how great people easily connected. I believe this is one of the great features of both GitHub and Twitter where you can share your code, ask questions, try to answer them, meet people working with your tools, etc. You can even offer to PayPal a beer like @tyokota did.
After all their great work, now someone like me (aka, without knowing javascript, Datamaps, etc) can walk you through an example of making an interactive choropleth map showing the homicides rates in Mexico from 1997 to 2013.
Homicides rates in Mexico 1997-2013
The first thing we need to make a custom map using rMaps
is a topojson file which in this case specifies the mexican states boundaries. This process is explained in more detail by @tyokota at custom-map which you can view here.
In this particular example, INEGI which is the National Institute of Statistics and Geography of Mexico has a map of the mexican states. @diegovalle explained how to download it here.
But before doing so, you might to install topojson
like I did below following the installation instructions. In the terminal:
## Install node.js following instructions at https://github.com/mbostock/topojson/wiki/Installation brew install node ## Install topojson npm install -g topojson ## Download map info from INEGI (Mexican official source) curl -o estados.zip http://mapserver.inegi.org.mx/MGN/mge2010v5_0.zip ## Decompress file unzip estados.zip ## Create shapefile ogr2ogr states.shp Entidades_2010_5.shp -t_srs "+proj=longlat +ellps=WGS84 +no_defs +towgs84=0,0,0" ## id-property needed so that DataMaps knows how to color the map topojson -o mx_states.json -s 1e-7 -q 1e5 states.shp -p state_code=+CVE_ENT,name=NOM_ENT --id-property NOM_ENT
Now that we have the topojson file mx_states.json we need to get the actual homicide data. @diegovalle has gone through the whole process of acquiring the data from official mexican sources and cleaning it. Lets download it.
# Download crime data ## From crimenmexico.diegovalle.net/en/csv ## All local crimes at the state level download.file("http://crimenmexico.diegovalle.net/en/csv/fuero-comun-estados.csv.gz", "fuero-comun-estados.csv.gz")
The data is not completely ready for us to use it and we need to reshape it a bit. In particular, we want to consider only the intentional homicides and group the data by state and date. We can get this to work by using dplyr
(Wickham & Francois, 2014).
## Load required packages library("dplyr") ## Load the crime data crime <- read.csv("fuero-comun-estados.csv.gz") ## Only intentional homicides crime <- subset(crime, crime == "HOMICIDIOS" & type == "DOLOSOS") ## Sum homicides by firearm, etc and group by state and date hom <- crime %.% filter(year %in% 1997:2013) %.% group_by(state_code, year, type) %.% summarise(total = sum(count, na.rm = TRUE), population = mean(population) ) %.% mutate(rate = total / population * 10^5) %.% arrange(state_code, year) ## How are states coded? summary(hom$state_code) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 1.00 8.75 16.50 16.50 24.20 32.00
We have the slight inconvenience that states are coded as integers from 1 to 32 instead of using their names. Using another of the files supplied by @diegovalle we can merge the codes. This requires using the foreign
(R Core Team) package for loading a dbf file and then merging both data sets with plyr
(Wickham, 2011).
## Needed for read.dbf library("foreign") ## The dbf from the state shapefile needed to merge state_code with state ## names codes <- read.dbf("states.dbf") codes$NOM_ENT <- iconv(codes$NOM_ENT, "windows-1252", "utf-8") codes$CVE_ENT <- as.numeric(codes$CVE_ENT) codes$OID <- NULL names(codes) <- c("state_code", "name") ## Load plyr for join(). Loading it before creates a problem with the dplyr ## call above library("plyr") ## Names needed for the map hom <- join(hom, codes) ## Lets look at the data head(hom) ## state_code year type total population rate name ## 1 1 1997 DOLOSOS 355 958126 37.051 Aguascalientes ## 2 1 1998 DOLOSOS 66 975585 6.765 Aguascalientes ## 3 1 1999 DOLOSOS 27 992515 2.720 Aguascalientes ## 4 1 2000 DOLOSOS 14 1009215 1.387 Aguascalientes ## 5 1 2001 DOLOSOS 22 1026437 2.143 Aguascalientes ## 6 1 2002 DOLOSOS 26 1044578 2.489 Aguascalientes
Great! We now have state names under name and the intentional homicide rate under rate (in homicides per 100,000) for each specific year. We can thus proceed to making the interactive choropleth map using the ichoropleth
function described by Ramnath here. This requires specifying the topojson file which is specified via dataUrl, the name of the map specified via scope and the most tricky part (for me at least) is that we need to specify the setProjection. These are all properties of the Datamaps library. In particular, the wiki describes how to use custom maps but this requires some javascript knowledge.
## Make the map library("rMaps") d1 <- ichoropleth(rate ~ name, data = hom, ncuts = 9, pal = 'YlOrRd', animate = 'year', map = 'states' ) ## Note that I am hosting the mx_states.json in Dropbox ## but if you are doing it locally, you only need ## dataUrl = "mx_states.json" d1$set( geographyConfig = list( dataUrl = "https://dl.dropboxusercontent.com/u/10794332/mx_states.json" ), scope = 'states', setProjection = '#! function( element, options ) { var projection, path; projection = d3.geo.mercator() .center([-89, 21]).scale(element.offsetWidth) .translate([element.offsetWidth / 2, element.offsetHeight / 2]); path = d3.geo.path().projection( projection ); return {path: path, projection: projection}; } !#' ) d1$save('rMaps.html', cdn = TRUE)
The end result is shown below:
You can also share the map using the publish method as shown below:
d1$publish("Intentional homicides rates for Mexico 1997-2013") ## You'll need a GitHub account
You will get a link to the rCharts viewer such as this one or if you prefer, you can also view the result using Pagist as shown here.
The code presented in this post was written by @diegovalle which can you view here and Ramnath which is shown here. I also figured out the trick of hosting the topojson file at Dropbox from @tyokota‘s code as I was running into Access-Control-Allow-Origin errors when hosting it in my academic website. Finally, but not least, Ramnath appropriately insists that all of this would not be possible without libraries such as Datamaps.
References
Citations made with knitcitations
(Boettiger, 2014).
- Hadley Wickham, Romain Francois, (2014) dplyr: dplyr: a grammar of data manipulation. http://CRAN.R-project.org/package=dplyr
- R Core Team , (2014) foreign: Read Data Stored by Minitab, S, SAS, SPSS, Stata, Systat, Weka, dBase, …. http://CRAN.R-project.org/package=foreign
- Carl Boettiger, (2014) knitcitations: Citations for knitr markdown files. http://CRAN.R-project.org/package=knitcitations
- Hadley Wickham, (2011) The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software 40 (1) 1-29 http://www.jstatsoft.org/v40/i01/
- Ramnath Vaidyanathan, (2014) rMaps: Interactive Maps from R.
Reproducibility
sessionInfo() ## R version 3.0.2 (2013-09-25) ## Platform: x86_64-apple-darwin10.8.0 (64-bit) ## ## locale: ## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: ## [1] rMaps_0.1.1 plyr_1.8 foreign_0.8-59 ## [4] dplyr_0.1.1 knitcitations_0.5-0 bibtex_0.3-6 ## [7] knitr_1.5 ## ## loaded via a namespace (and not attached): ## [1] assertthat_0.1 digest_0.6.4 evaluate_0.5.1 ## [4] formatR_0.10 grid_3.0.2 httr_0.2 ## [7] lattice_0.20-24 rCharts_0.4.2 RColorBrewer_1.0-5 ## [10] Rcpp_0.11.0 RCurl_1.95-4.1 RJSONIO_1.0-3 ## [13] stringr_0.6.2 tools_3.0.2 whisker_0.3-2 ## [16] XML_3.95-0.2 xtable_1.7-1 yaml_2.1.10
Check other topics on #rstats.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.