[This article was first published on A Distant ObserveR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
PEBOS is over. Time to look at the details of the Election. The final results are not yet in, but the exit polls are there, and up for grabs. Just to get warm: here’s a tiny example.
Obviously Romney had an age problem. But for now I don’t want to speculate about political consequences. This is just an example plot.
Let’s imagine we have a data.frame “EP” that contains the state level exit polls for the presidential election 2012. (Actually, I have these data, and tomorrow I’ll post how I got them using R – and a tiny bit of Python. For today I just let them reside in a file called “PresExitPolls2012.Rdata”.)
Update: I’ve released the code to create the PresExitPolls2012.Rdate file today.
I fire up R and the first code snippet is
library(ggplot2) library(plyr) library(reshape) load(file="PresExitPolls2012.Rdata") head(EP)
For now I just concentrate on the “Vote by Age”. There are two different age groupings for that question:
unique(EP$QNo[EP$question=="Vote by Age"]) # 4 category breakdown head(EP[EP$QNo==2, ]) # 6 category breakdown head(EP[EP$QNo==3, ])
Today I want to produce a plot of the 6 category breakdown, so I reduce the data and do some checks:
1. There might be some inconsistency between states in the numbering of the questions. There should be 6 categories for each state.
2. This year’s exit polls have been conducted in 31 states. In addition to this the reduced dataset should contain the nation wide data. So I expect 32 “states” in the newly created VbA dataset.
Both checks can easily be implemented with the daply funktion from Hadley’s plyr package:
VbA <- EP[EP$QNo==3, ] unique(daply(VbA, .(state), nrow)) == 6 length(daply(VbA, .(state), nrow)) == 32
The plot needs the data to be in “long format”. I let Hadley’s melt function (from the reshape package) do the job. Then I remove all Candidates with the exception of Obama and Romney.
vba <- melt(VbA, id = c("state", "answer"), variable_name = "Candidate") unique(vba$Candidate) # we're only interested in Obama and Romney vba <- vba[vba$Candidate %in% c("Obama", "Romney"), ]
Finally the plot can be created. Initially the plot was a mess with garbled and unreadable text elements. I’m indebted to the people over at is.R() for their most valuable hints that helped me arrive at a readable plot.
But before plotting there’s a fix to be applied. In the VbA data.frame the numbers for the candidate were numeric. For some reason I’ll have yet to look into this made the NA’s appear like peaks with both candidates having roughly the same value of about 70. (Thanks to lemonlaug whose comment alerted me to the absurdity in the original plot.)
Now to the fix. It’s as simple as that:
vba$value <- as.numeric(vba$value)
Here’s the final code snippet:
png(file = "VbA2012.png", width = 960, height = 960) ggplot(vba, aes(answer, value)) + geom_line(aes(group = Candidate, color = Candidate)) + facet_wrap(~ state, ncol = 4) + labs(title = "2012 Presidential Vote by Age\n", y = "Percentage\n", x = "Age group\n" ) + theme(axis.text.x = element_text(colour = "black", size = 9, angle = 45, vjust = 1, hjust = 1), axis.text.y = element_text(colour = "black", size = 9, angle = 0, vjust = 1, hjust = 1) ) + scale_y_discrete(breaks=c(30, 50, 70)) + scale_colour_manual(values = c("darkblue", "darkred")) dev.off()
To leave a comment for the author, please follow the link and comment on their blog: A Distant ObserveR.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.