Site icon R-bloggers

Gathering RealClearPolitics Polling Trends with XML

[This article was first published on is.R(), and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Now that the election is over, you may want to use polling data in a model of the campaign. Simon Jackman has thoughtfully made his daily state-by-state predictions available for download, but a commonly-used dataset is the RealClearPolitics polling average.

As you can see when you go to RCP, they have a nice HTML5 graph (screenshot above), over which you can hover with your mouse to reveal daily point estimates. Unfortunately, the numbers that compose those point estimates are a little tricky to tease out — at least, it was tricky for me. Fortunately, I managed to wrangle out the Romney vs. Obama daily averages, which you can download here [CSV].

Fortunately, RCP uses stores their time series data in XML, meaning that the method I used to get those Romney vs. Obama numbers can be used to collect any RCP data, such as from this comparison of Obama & Bush Job Approval. Just view source, and [CTRL-F] for “xml,” and try to identify the XML file from which the graph is drawing data:

In this case, the file appears to be o_vs_b6.xml, which we can find listed in this directory of all RCP XML files and graph-drawing code.

From there, you can just use the R package XML and the following code as a guide for neatly folding the XML data into a data.frame. It will take a little effort on your part (i.e. it’s not just “CTRL-A, CTRL-R”), but the XML should be consistently-formatted, and thus not too difficult to parse.

https://gist.github.com/4086452

To leave a comment for the author, please follow the link and comment on their blog: is.R().

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.