Site icon R-bloggers

More Chicago Mayoral Analaysis

[This article was first published on Offensive Politics » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

In a previous post on the Chicago Mayoral primary I looked at plotting returns on maps as a way to better understand the outcome. Maps help us visually determine if there is a geographic or clustering component to returns, but they aren’t the most rigorous way to compare election returns.

Another way to view the returns is a to use a Seats-Votes plot, like I did in my 2010 election returns blog post. Quickly, the Seats-Votes is a plot of a smoothed Gaussian kernel density estimator. Please refer to the Visualizing US House Election Returns post for an annotated example plot and more general information.

We’re going to create a Precincts-Votes plot of the Chicago Mayoral Democratic primary election returns. We will see a smoothed curve showing an estimation of the number of precincts with returns at a given percentage. These types of curves have been traditionally used in a two party race, meaning 50% is the cutoff for a win. But the Chicago Majoral Primary is a 6-way race, so a simple plurality is all that is required to win. Things are further complicated since the overall winner must receive 50% of the overall vote to avoid a runoff. I’m going to explore win percentages in this race from a multi-candidate perspective in a future blog post, but still keep it in mind while looking at these plots.

Updated R code and plots are available on the Chicago Mayor 2011 github page.

Data Preparation

The election returns from the previous blog post are in a wide format, one line per precinct. To effectively plot these we’re going to need to convert this to a long format. We’ll use the melt function.

# take the wide df variable and melt it to a LONG format
df.m <- melt(df,"WARD_PRECINCT", 
   c("emanuel_pct","delvalle_pct","braun_pct","chico_pct","watkins_pct","walls_pct"))
head(df.m)

Now our data looks like this:

WARD_PRECINCT variable value
1-1 emanuel_pct 46.59
1-2 emanuel_pct 64.07
1-3 emanuel_pct 60.12
1-4 emanuel_pct 59.24
1-5 emanuel_pct 66.67
1-6 emanuel_pct 54.51

Plots

The Winner

First we’ll make the precincts-votes plot for the winner of the election, Rham Emanuel.

qplot(emanuel_pct,data=df,geom=c("density","rug"),
      main="Precincts-Votes Curve, Chicago Mayor 2011",
      xlab="Vote %",ylab="Density")

Precincts-Votes curve, Emanuel Only, Chicago Mayoral 2011

This plot is very information-heavy, but can be decoded pretty easily. We see a large bubble of precincts where Mr. Emanuel received between 50 and 65 percent of the vote, and much fewer precincts in the 20-40 percent range, and very few at either extreme. The large spike near 60% implies Mr. Emanuel performed better in many precincts than his 55% overall vote total would lead us to believe.

All Candidates

Now we’ll create a precincts-votes plot for all candidates combined, but we’ll leave off the rug plot along the bottom. This will allow us to compare returns for all candidates at once.

qplot(x=value,group=variable,color=variable,data=df.m,geom="density",
		main="Precincts-Votes Curve, Chicago Mayor 2011",xlab="Vote %",ylab="Density") + 
		scale_color_brewer(name="Candidate") + geom_vline(x=50)

Precincts-Votes curve, All Candidates, Chicago Mayoral 2011

This combined precincts-votes chart is not as useful at comparing returns as I would like. Several of the candidates received near zero votes in many precincts, causing the scale to skew heavily towards larger values. We’ll drop the 3 worst performing candidates and build the chart again:

Top 3

qplot(x=value,group=variable,color=variable,data=subset(df.m,variable != 'walls_pct' & variable != 'braun_pct' & variable != 'watkins_pct'),
 		geom="density",main="Precincts-Votes Curve, Chicago Mayor 2011",xlab="Vote %",ylab="Density") + 
 		scale_color_brewer(name="Candidate", labels=c("Emanuel", "Del Valle", "Chico"),
		breaks=c("emanuel_pct", "delvalle_pct","chico_pct")) + geom_vline(x=50)

Precincts-Votes curve, Emanuel, Del Valle, Chico, Chicago Mayoral 2011

This chart is quite a bit better than the last. Candidates Emanuel, Chico and Del Valle received 55%, 23% and 9% of the overall vote respectively and I think this chart helps us better understand these totals. We see Del Valle underperformed in the vast majority of precincts, but was still competitive in several with returns between 20 and 40 percent. Candidate Chico has a large grouping in the 10% return range, whichis interesting given that he received 23% of the overall vote total. His ten percent wouldn’t win him a precinct, but it would keep him competitive in the overall total. This is important given that a simple plurality may win a precinct in a 6-way race. We can also see the candidates were all moderately competitive in many of the precincts around the 20-40 return range, which is what you would expect an average return to be for a contested precinct.

I hope this post has shown that precincts-votes curves can still be informative in a multicandidate race, and helped us better understand the makeup of the Chicago Mayoral Democratic primary. In my next blog post I’ll look at a way to visualize precinct returns in a multi-candidate race and how to measure the overall competitiveness of an election.

Code, data, and output are available on the Chicago Mayor 2011 github repository. Comments, questions, and pull requests are greatly appreciated.

To leave a comment for the author, please follow the link and comment on their blog: Offensive Politics » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.