Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Now that the election is over, you may want to use polling data in a model of the campaign. Simon Jackman has thoughtfully made his daily state-by-state predictions available for download, but a commonly-used dataset is the RealClearPolitics polling average.
As you can see when you go to RCP, they have a nice HTML5 graph (screenshot above), over which you can hover with your mouse to reveal daily point estimates. Unfortunately, the numbers that compose those point estimates are a little tricky to tease out — at least, it was tricky for me. Fortunately, I managed to wrangle out the Romney vs. Obama daily averages, which you can download here [CSV].
Fortunately, RCP uses stores their time series data in XML, meaning that the method I used to get those Romney vs. Obama numbers can be used to collect any RCP data, such as from this comparison of Obama & Bush Job Approval. Just view source, and [CTRL-F] for “xml,” and try to identify the XML file from which the graph is drawing data:
In this case, the file appears to be o_vs_b6.xml, which we can find listed in this directory of all RCP XML files and graph-drawing code.
From there, you can just use the R package XML and the following code as a guide for neatly folding the XML data into a data.frame. It will take a little effort on your part (i.e. it’s not just “CTRL-A, CTRL-R”), but the XML should be consistently-formatted, and thus not too difficult to parse.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.