Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
On Tuesday January 3rd 2012 the Iowa Republican party held it’s presidential caucuses, with Mitt Romney beating Rick Santorum by 8 votes as of noon on Jan 4th. This was an exciting race with multiple lead changes and entrance polling showing many late undecideds and large gaps in candidate support by age and income. In a nice twist for those of us who like playing with election data (and who doesn’t?), the IA GOP has published their election results using Google Fusion Tables. This allows us to skip the usually messy data scraping and piecing together that is par for the course.
In this post I’ll create heat maps to illustrate candidate support geographically, as well as differential heat maps to compare results between candidates. Everything is done in R and all the code and data are available from the github repository for this post. Pull requests for new analysis or fixes are greatly appreciated.
Data Wrangling
Electoral data is usually published as some combination of PDFs, HTML files, CSV files, ESRI shapefiles, . There is usually a fair amount of work getting names and IDs to match up between formats, but not this time. Thankfully the IA GOP published their caucus results as a Google Fusion Table. This allows easy online viewing, and export to other formats. You can ignore this section if you just want to hack on the provided result files. Otherwise, to produce the data files for this post I did the following:
- Exported a CSV copy of the IAGOP election results with File->Export in the Fusion Table.
- Exported a KML file with Iowa county outlines:
- View the Iowa election map with Visualize->Map
- Export the KML file by clicking the “Export to KML” link on the map page
- Loaded the KML file into Quantum GIS and exported the layer as a Shapefile.
That looks like a lot of work, and it was sort of complex, but it is miles easier than taking shape and result data from different sources, or performing all those steps in code.
R code
The R code makes use of several awesome packages: maptools, ggplot2, RColorBrewer, and gpclib. I took code for preparing and plotting shape files w/ ggplot2 from the gglpot2 wiki. I lifted some ggplot2 theme code from Osmo Salomaa on the ggplot2 mailing list. This work would have been substantially more difficult without the packages and links listed above; thank you to their authors for making their work available for free.
Support Heat maps
First we’ll create heat maps of the vote percentage each candidate received by county in the Iowa GOP Caucuses in 2012. These maps show us strong and weak geographic areas for each candidate. The scales are identical for all candidates so comparing maps should be quite easy. In alphabetical order, and click through for a much larger version:
Bachmann
Gingrich
Paul
Perry
Romney
Santorum
Take away
What do these maps tell us? Bachmann, who finished last, performed poorly across the entire state. Perry, who finished 5th, had some strong counties in the south west but overall performed poorly as well. Gingrich, the front runner until two weeks ago and the 4th place winner, had no really strong counties anywhere. Ron Paul finished 3rd and enjoyed several counties with large support in the the north east of Iowa, and was a contender almost everywhere. Rick Santorum had one huge win and several other large county wins in the west, while Romney had more success in the eastern side of Iowa. Romney also had more counties where he won 30-40% of the vote than Santorum did.
Relative Heat maps
The raw support heat maps above are OK, but they aren’t terrific for comparing two candidate’s returns. Next we’ll create relative heat maps that show the difference in support by county for each of the top 3 candidates. We’ll also create a histogram of the same values since mapping the differences can sometimes distort the overall distribution of values.
Romney vs Santorum
Positive values show a greater Romney percentage than Santorum, and negative values show the opposite. For example: If Romney won a county with 25%, and Santorum won the county with 50%, the Romney – Santorum value would be -25.
Analysis:
The relative heat map between the first and second place candidates in the IA GOP 2012 caucus shows us a few interesting items. Santorum cleaned up in the upper northwest of Iowa, less so in the southwest, and mixed results everywhere else. The histogram gives us a clearer picture of the overall breakdown of the results, showing us few large percentage wins for either candidate and a spike of smaller percentage wins for Santorum. The number of Santorum wins were probably offset by the size of the counties that Romney won, which isn’t shown on either of these graphs.
Romney vs Paul
Analysis:
The first and third place candidates dont have a dramatically different map or distribution than the first and second. The map shows a few Paul strong points, but the vast majority of the counties were very close races. The histogram backs up this hypothesis, and shows larger win percentages for Romney in more counties.
Santorum vs Paul
Analysis:
Aside from the large win the northwest, it looks like Paul actually had more winning counties than Santorum. Both candidates did well, and were competitive throughout the state.
Wrapup
The heat maps and differential maps give us an unbiased view into how the Iowa Republican 2012 caucuses turned out. Each of the top 3 candidates was a contender in almost every county and had one or two things gone differently than anybody could have won the caucus. Please feel free to download and hack the code for this article from my github page. Thank you for reading, and please direct any questions or comments to me using the comment form below or via email at: jjh@offensivepolitics.net.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.