Minute by Minute Twitter Sentiment Timeline from the VP debate

[This article was first published on NERD PROJECT » R project posts, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Click on above graph to enlarge.

Background

The data for this graph was collected automatically every ~60 seconds of the VP debate on 10/11/2012, with an ending aggregate sample size of 363,163 tweets.  From this dataset duplicate tweets were removed (because of bots), which gave a final dataset of 81,124 remaining unique tweets (52,303-Biden, 28,821-Ryan).  Every point in this graph is the mean sentiment of tweets gathered for that minute.  The farther above zero the point is means that it is higher positive sentiment of the tweets, and the lower it gets below zero the more negative. It would be very interesting to compare this to the transcript for inference.  The one very noticeable take away is the jump in sentiment as soon as the debate was over at 22:30

R Code for this data collection and graphing

To collect this data I updated my original code from the presidential debate as follows:

vp<-function(x){ Ryan=searchTwitter('@PaulRyanVP', n=1500) Biden=searchTwitter('@JoeBiden', n=1500) textRyan=laply(Ryan, function(t) t$getText()) textBiden=laply(Biden, function(t) t$getText()) resultRyan=score.sentiment(textRyan, positive.words, negative.words) resultRyan$candidate='Ryan' resultBiden=score.sentiment(textBiden, positive.words, negative.words) resultBiden$candidate='Biden' result<-merge(resultBiden,resultRyan, all=TRUE) result$candidate<-as.factor(result$candidate) result$time<-date() return(result) }

Then to have it R run automatically collect the data every 60 seconds in an endless loop (I wasn’t sure when I wanted to stop it at the time) you just run a repeat function.

debate<-vp() repeat { startTime x<-vp() debate<-merge(x, debate, all=TRUE) sleepTime 0) Sys.sleep(sleepTime) }

At 10:56pm I got bored and the debate was over, so I just hit stop and ran the following to get the graph:
x<-subset(debate, !duplicated(text)) x$minute<-strptime(x$time, "%a %b %d %H:%M:%S %Y") x$minute1<-format(x$minute,"%H:%M") x<-subset(x, minute1>="21:00") period<-unique(x$minute1) period<-period[order(period)] Biden Ryan mean<-data.frame(period, Biden, Ryan) dfm ggplot(dfm, aes(period, value, colour=variable, group=variable, xlab="time", ylab="score"))+ geom_point()+geom_line()+opts(axis.text.x=theme_text(angle=45), axis.ticks = theme_blank(),axis.title.y=theme_blank())
I have to admit, doing this actually made watching the debate kind of fun.

For cleaner access to the code please go to my git hub


To leave a comment for the author, please follow the link and comment on their blog: NERD PROJECT » R project posts.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)