Playing with Twitter Data
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Last Friday, the Institute for Social Sciences hosted a great one-day conference on various aspects of the reproducability crisis, Making Social Science Transparent. It was the first time I’ve done much tweeting during an event like this, and while it felt a little silly, it was also fun, it was nice to hear what was resonating with other people at the event, and I’m psyched to stay connected to other participants on Twitter.
It also gave me an excuse to learn to scrape and analyze Twitter data. And doing that pushed me to setup RMarkdown rendering on this Jekyll site. I’m pretty psyched about both.
I was surprised by how easy it was to get and manage Twitter data. The twitteR
package is awesome. If you want to try this out yourself, the first thing to do is register your “app” with Twitter, which just takes a couple minutes.
Now let’s see if we can find anything interesting…
Get the data
Start by loading some libraries and authenticating myself to Twitter. The arguments to setup_twitter_oath
are access tokens that Twitter gives you when you register your app and are just strings that I’ve defined elsewhere so that you can’t see them. That plus searchTwitter
and twListToDF
are the only functions you need to get and organize tweets around a particular subject. Super simple.
When do people tweet?
We get 133 original tweets and 177 retweets. There were a few announcements in the days leading up to the conference, and a few retweets after, but the action was concentrated on the day of the event.
People compose original tweets during panels, not at breaks and especially not during lunch. Retweeting abates less during the breaks, perhaps due to people not at the conference.
What platforms are people using?
Mostly what we would expect, but some bots got in on the action too.
Emotional valence of tweets
Let’s look at the content of the tweets, first over the course of the day. The first panel, Defining the Issues: Research on Replication, Reproducibility, and Transparency had some hard truths in it – Brad Jones predicted it would be the “sky is falling” panel – and we see that reflected in the smoother going below zero for one of only two times in the day during that panel. The second panel was a call to action: we heard about the Open Science Framework from Katie Corker (OSF is doing two workshops on campus May 4) and Data Carpentry from Tracy Teal, as well as some ideas about what individuals and labs can do on their own. That, it seems, was the emotional high point of the day, at least until someone had a couple glasses of wine at the reception and then closed out the day with an exuberant tweet.
Do happier tweets get retweeted more?
My guess was that more emotionally-positive tweets (greater emotionalValence scores) would be retweeted more – people gravitate to happy messages, right? – but it looks like tweets with less-emotional content (closer to zero valence) get retweeted more. I wonder if it’s generally true that academics prefer and retweet emotionally neutral messages, and how that compares to the population at large. If it is generally true for academics but not others, there’s an important science-outreach lesson here.
Emotional content
Now let’s look at the content of what was tweeted, parsed by the emotional valence of the tweet. This is a super naive analysis – I’m using qdap
’s polarity
function straight out of the box to examine the emotional valence of each tweet. One nice thing about that function is that it returns the positive and negative words found in each text, which allows me to A) tabulate the most-used positively and negatively valenced words, and B) to strip those words from positive and negative tweets to see what is being talked about in positive and negative ways, without interference from the emotionally-charged words themselves.
Here is the frequency of usage in the conference tweets of words that qdap
(using the sentiment dictionary of Hu & Liu, 2004) identifies as positively or negatively valenced.
Emotionally associated non-emotional words
No text analysis would be complete without a wordcloud. Here I’ve classified tweets by their emotionalValence score and removed the words that score on emotionally valence, to sort the content that was being talked about in emotionally different ways. Mixed success, I think – there’s not a lot of data here, and the emotional-words list could use some tuning.
Who’s retweeting whom?
Twitter’s API doesn’t tell us where a retweeter saw the tweet that they retweeted; an edge always goes from the original author of the tweet to the retweeter, so we can’t follow the diffusion of a tweet. But, we can get a sense of who is being retweeted, and we see a core of individuals engaging in a conversation at the center of the graph. Nodes are sized to their total degree (retweeting and being retweeted), and edge-width is proportional to the number of retweets between that pair. Labeled nodes are those that were retweeted at least once.
Who’s metioning whom?
It’s almost exclusively speakers and the host institute that get mentioned, though a few outside players get some shout-outs, especially the Open Science Framework, which was discussed in detail by Katie Corker. Unsurprisingly, the host, UCDavisISS, does a lot of mentioning and gets mentioned quite a bit. I think it’s interesting that graph fractures almost perfectly between the host and the speakers: there is very little cross-talk between those who are mentioning and being mentioned by the host, and those who are mentioning and being mentioned by the speakers.
Edges originate at the tweeter and point to the mentioned; nodes are scaled to number of mentions.
Thanks for reading. It really was a great conference; if you’d like some real information on what went down here is Ben Hinshaw’s summary of the event.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.