Did Russia Use Manafort’s Polling Data in 2016 Election?
[This article was first published on sweissblaug, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction:
On August 2, 2016 then Trump campaign manager, Paul Manafort, gave polling data to Konstantin Kalimnik a Russian widely assumed to be a spy. Before then Manafort ordered his protege, Rick Gates, to share polling data with Kilmnik. Gates periodically did so starting April or May. The Mueller Report stated it did not know why Manafort was insistent on giving this information or whether the Russian’s used it to further Trump’s cause (p. 130 see here for my summary of Mueller Report V1).
One theory says that Manafort wanted to show the good work he was doing to Kilimnik’s boss, a Russian Oligarch named Deripiskia, whom Manafort owed money to. A more sinister hypothesis is that Manafort knew that the information would be valuable in the hands of Russian’s trying to interfere with the election.
This post will analyze whether the Russians used the polling data irrespective of Manafort’s intent. I looked at Russian Facebook Ads uncovered by House Intelligence Committee and tried to identify any changes in messaging after August 2nd. I conclude with a guess on what the polling data was shared.
Russian Facebook Data:
The House Intelligence Committee released thousands of Russian Advertisements by the Internet Research Agency. There have been several analysis on these advertisements that discuss they’re effectiveness and one good one is by Spangher et al. However, I couldn’t find any that showed topics of advertisements over time.
I focused the analysis to data in 2016 which includes periods of Manafort coming into the position of campaign manager and the election itself in november. Overall there 1858 facebook Ads captured in this dataset. Below is a time series plot of number of Advertisements per day for 2016.
There are periods of high activity in May / June and in October right before the election.
Change After August 2nd?
Each advertisement has metadata and text associated with it including: date, text, target population, etc. To see if there were any changes through time and in particular august 2nd I tried some topic modeling and text clustering to see if there were any natural changes. I couldn’t find any changes or trends using an unsupervised approach.
Instead I built a predictive model with the response being a binary variable; before / after august 2nd and explanatory variables as text features from each ad (over 1200 words). I then performed variable importance on these words to see which were most predictive. Below I plotted the number of adverts with the important words divided by numer of advertisements for a particular day to get a normalized percentage.
The blue line is when Manafort made contact with Kilimnik initially and the red line is the august 2nd meeting. There does appear to be large increases in the words associated with African American civil rights topics after 8/2. Specifically these words were not in the advertisements texts themselves but were in the ‘people who liked’ description. That is, if you liked ‘Martin Luther King’ on your profile then a particular ad would target you.
Another way to look at this information is to see the proportion of these words used before and after 8/2.
The above plot shows the number of times a word appeared before and after 8/2 and the P(date>8/2) | word). For instance the word 1954, signifying the beginning of civil rights, occured 4 times before and 376 times after 8/2 which means that just under 99% of times it appears happen after that. This suggests there was a change in the IRA advertisements where they focused more on targeting people that were interested African American civil rights issues.
Conclusions / Discussions
I’m guessing that the contents of the polling data would be something related to African Americans and how those that have an interest in civil rights movement are more susceptible to negative ads.
Do I think the evidence presented here is that strong enough to believe the Russians used polling data? Meh, not really. For few reasons:
- All words found here were used a few times before the 8/2
- Gates gave information on a continuous basis. If Russians used this data I assume they would incorporate it accordingly and there would not be a discrete change at 8/2
- I only did this for one date. Perhaps if I did this analysis for an arbitrary dates then I would find other words that were associated with other dates
I’m not saying that they didn’t use the polling data but I don’t think the evidence here is strong enough to say that they did. At a minimum I think that the IRA and Russians adapted Ads to target different populations at different points in time. This shows they are sophisticated and probably learn from previous results.
Code
To leave a comment for the author, please follow the link and comment on their blog: sweissblaug.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.