Revisiting the GOP Race with the Huff Post API and pollstR
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Well, one election is over but it is never too soon to start another – or in this case revisit the past four years
One day after the 2008 US Presidential election, there was a Rasmussen poll taken of 1000 likely voters asking for their choice for the 2012 Republican Presedential Candidate.
The overwhelming favourite was Sarah Palin, who garnered 64% of the preferencees with Huckabee(12) and Romney(11) the only others to reach double digits. And thus started arguably the most topsy-turvy race in election history – ending in ultimate defeat.
Guys at the Huffington Post have kindly produced an API for stacks of opinion polls and Drew Linzer has produced an R function, pollstR, on github to interact with it
The first step is to determine which HP poll the data is in
library(XML) library(ggplot2) library(plyr) url <-"http://elections.huffingtonpost.com/pollster/api/charts" raw.data <- readLines(url, warn="F") rd <- fromJSON(raw.data) pollName <- c() for (i in 1:length(rd)) { pollName <- append(pollName,rd[i][[1]]$slug) print(pollName) } |
This provides a list of 345 polls and a quick perusal shows that the required one is named “2012-national-gop-primary” so this can be plugged into the aforementioned function, once it has been sourced, and an analysis of the resulting data performed
# extract data to a data.frame polls <- pollstR(chart="2012-national-gop-primary",pages="all") # look at the structure colnames(polls) # 43 columns most of them names of candidates #[1] "id" "pollster" "start.date" "end.date" "method" "subpop" "N" "Romney" "Gingrich" ... # the data needs to be reshaped - for my purpose I just need the end.date and candidates data polls <- polls[,c(4,8:43)] polls.melt <- melt(polls,id="end.date") # set meaningful columns colnames(polls.melt) <- c("pollDate","candidate","pc") # get a list of candidates that have polled 10% or more at least once contenders <- ddply(polls.melt,.(candidate),summarize,max=max(pc,na.rm=TRUE) ) contenders <- subset(contenders,max>9)$candidate # eliminate results for undecideds etc. contenders <- contenders[c(-4,-5,-7,-11,-18)] # I want to plot the each poll leader and have their name show on the max value for when they led polls.melt <- arrange(polls.melt,desc(pc)) polls.melt <- ddply(polls.melt,).(pollDate), transform, order=1:nrow(piece)) leaders <- subset(polls.melt,candidate %in% contenders&order==1) # romney has two pc of 57% so need to hack for a clear graph leaders[96,3] <- 56 # create highest poll (when leading) for each candidate leaders$best <- "N" for (i in 1:nrow(leaders)) { if (leaders$pc[i]==leaders$max[i]) { leaders$best[i]<-"Y" } } # now produce graph q <- ggplot(leaders,aes(as.POSIXct(pollDate),pc))+geom_point(aes(colour=candidate)) q <- q+geom_text(aes(label=candidate,colour=candidate,vjust=-1),size=3,data=leaders[leaders$best=="Y",]) q <- q+ ggtitle("Leader of GOP polls and Maximum value by Candidate")+ylab("%")+xlab("")+theme_bw() q |
For the first couple of years, Palin, Huckabee and Romney continued to dominate but when the race commenced for real an amazing eleven participants – even Donald Trump – ended up topping a poll on at least one occasion
It is worthwhile looking at individual candidate’s performance over the final 18 months
p <- ggplot(subset(polls.melt,candidate %in% contenders&pollDate>"2010-12-31"),aes(pollDate,pc)) p <- p+ geom_smooth(se=FALSE) +facet_wrap(~candidate) +scale_x_date(breaks = date_breaks("years"),labels = date_format("%Y")) p <- p + ggtitle("Smoothed results of National Polls - GOP Race")+ylab("%")+xlab("")+theme_bw() p <- p+ theme(strip.text.x = element_text(colour="White", face="bold"), strip.background = element_rect( fill="#CB3128")) p |
Once Palin and Huckabee had proved uninspiring, the field narrowed to the cultish Ron Paul, the ‘meh’ candidate, Romney, and a host of short-lived shooting stars
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.