Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Who lies the most in US politics?
Most Americans, and anyone that follows US politics, will be aware of the tremendous changes and volatility that has struck the US political landscape in the past year. The ascent of Donald Trump from a billionaire entertainer to a fully fledged presidential candidate, alongside the unexpected popularity of Bernie Sanders and the nomination of Hillary Clinton as the first female presidential candidate, has catapulted this political season into one of the most entertaining, vicious and confusing year.
There have been a number of outlandish claims made recently, and one of the most important contribution of journalistic work is to comment and verify the veracity of statements made by those major political figures. One example of such work is published and maintained by Politifact, which regularly fact-checks statements made by a variety of sources, and displays the information in the following way:
I have casually browsed this website for a while, but I recently decided to try and surface some more information out of it. In particular, I wanted to:
- visualize a summary of the overall distribution of
statement veracity
for each individual. - build a web app to visualize direct comparisons of statements between individuals.
In the following post, I will collect and analyze some of the Politifact data, while also going through the steps that allowed me to create the web application displayed below
Obtaining data on political statements
To begin, I wrote a Python script to scrape all the required data on Politifact. The code snippet below iterates through each page of the Politifact website to extract and process the HTML source code containing the following data points:
- the individual that made the statement
- a URL to a headshot of the individual
- a categorical variable that states whether the statement is deemed to belong in one of the following categories (True; Mostly True; Half-True; Mostly False; False; Pants on Fire!; No Flip; Half Flip; Full Flop).
Below is the source code used to scrape the Politifact data
A quick analysis of the Politifact data
With the data all gathered, I proceeded to perform a small analysis of the data. To begin, I looked at the different types of comments stated by the entities in the Politifact data. For example, what proportion of Donald Trump’s comments were True or False? Or what proportion of Donald Trump’s comments were a complete U-turn to his previous comments? For the sake of simplicity, I restricted the analysis to the 20 most common entities in the Politifact data, which produced the plot below:
So what do we see? As expected, “facts” reported through blogs or social media networks (Facebook posts) tend to be predominantly false. On the reverse, Democrats (Hillary Clinton, Barack Obama, Bernie Sanders) tend to speak truthful comments. It is interesting to note that establishment Republicans such as Jeb Bush, Marco Rubio, Scott Walker and Rick Scott also trend towards true comments, but have a non-negligible proportion of “Mostly False” comments. The conservative-leaning democrat Charlie Crist follows a similar pattern. Finally, the two most “anti-establishement” Republicans, Donald Trump and Ted Cruz, blow out their fellow politicians with a majority of “False” or “Mostly False” comments.
The figure below shows an aggregated view of the total proportion of True and False comments made by each entity, which clearly shows the disparities between Democrats and Republicans.
At this point, it is important to raise a potential caveat of this analysis, which is that we are unaware of amy potential bias within the Politifact data. It is conceivable that the people curating the comments are democratic-leaning, or that the sample of comments taken for each entity is not representative. My prior opinion makes me think that this is not the case, but unfortunately I have no data to back this up.
Building the Politifact web app
The next step was to build a Shiny app that would allow anyone to perform head-to-head comparisons between individuals in the Politifact data. For simplicity, I have hosted the app on shinyapps.io, but in order to ensure full reproducibility, I have also “containerized” the app using Docker. If you have not used or heard of Docker, their own website gives a very good description:
Docker containers wrap up a piece of software in a complete filesystem that contains everything it needs to run: code, runtime, system tools, system libraries – anything you can install on a server. This guarantees that it will always run the same, regardless of the environment it is running in.`
With that in mind, the simplistic development workflow used to create the web app was:
- Build the shiny app locally (i.e. on your local machine) and ensure that everything works just as expected.
- Create a file called
Dockerfile
, which we will use to place all the relevant instructions and commands, as well as the environnment variables and configurations that will define our container. - Build and run the Docker container to check that the app behaves as expected. If not, iterate over the steps above until you are satisfied.
- Reap the benefits of fully reproducible software across any computing platform!
The source code for the full web app can be found at the following GitHub repository, and has the following directory structure:
By cloning this Github directory and typing the commands below, you will be able to have a fully working web app.
Note: the above was tested on a Ubuntu machine hosted on DigitlOcean, I’m assuming you will have to slightly tweak things when working on a local machine (i.e. if using something like Boot2Docker.
A snapshot of the app is displayed below. It allows to select two political entities and highlights words most often stated by each, along with the overall distribution of types of comments (true, false, etc…) that they made.
Conclusions
We scraped and analyzed Politifact data, a resource that fact-checks statements made by various political entities. We found that statements conveyed via blogs or social media were often misleading. Interestingly, we also uncovered distinct differences between major political figureheads within the Republican and Democratic parties. While it is important to note that the inherent source data may be biased, we found that Democrats were generally more truthful in their political statements. To enable further deep dive of the data, we also build a light-weight app that enables head-to-head comparisons between any pair of entities in the Politifact data. In doing so, we obtained summarized views of the veracity of comments made by entities, as well as the broad topics that they cover.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.