Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Anyone who’s spent any time around data knows primary keys are your friend. Enter the FIPS code. FIPS is the Federal Information Processing Standard and appears in most data sets published by the US government.
Name Matching
The map below is an example as the “wrong way” to do something like this. This map uses a string matching technique to match US county names with the county names in the maps package. The map below can be replicated with this GitHub gist, but I don’t recommend it. I’m using an air quality data set from the Centers for Disease Control and Prevention.
Problems with string matching:
County names change
Louisiana often has “parishes” or “bayous”
Alaska often has “territories” or “census areas”
As you can see from above, many counties are missing. It’s possible to fix this with some fancy regex work, but if may take quite some time before you realize why Oglala Lakota County is missing from your base map!
FIPS Matching
The maps package contains a built-in data set that you can call with `county.fps`. The only problem with this is, you still end up string matching with your data set. I’ve found the best way to get a map with baked-in FIPS codes is to download (one of many) shape files provided by the Census Bureau. NOTE: There are also shape files for zip codes, congressional districts, census tracts, etc. The shape file we’re using can be downloaded here. Just unzip and place it in your working dir.
Note that these data don’t contain values for Alaska and Hawaii.
Results
The data are taken from 2000 to 2010, and the shape file we’re using is from 2013. But since FIPS codes remain constant, even when county names change, every thing matches up just fine.
Other Advantages
Leaflet anyone? Another plus to shape files is, they are easily rendered to a leaflet map. I threw the below map together “quick and dirty.” I’m not really pleased with the green-to-red color ramp, but I’m sure that could be fixed by manually assigning color buckets. GGplot seems to have a better handle on color ramping straight out of the box.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.