Visualization of a Twitter retweet network: art or useful data visualization?
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This is a Twitter retweet network. When people tweet, they may get retweeted by other people, repeating the message for their followers to view. Each retweet is a one-way flow of information that links the first person to each person who retweeted them (forwarded the original tweet into their own network). So, in this visualization we are looking at a network of people (white nodes) linked (orange lines) by information flows in the form of 140 character retweets. But is this kind of visualization helpful for analysis or just a kind of computer generated eye-candy?
One of the criticisms this kind of plot frequently gets is that it is a ‘yarn-ball’, meaning it’s too complicated to make sense of. Indeed, this graph contains 44,783 nodes (people) and 163,329 edges (retweets). The data for the network is a subset of our 60+ million tweets related to Occupy. Specifically, this network is made up of retweets that contain the hashtags #OccupyOakland and ##OO from October 30th, 2011 to November 7th, 2011.
Despite the complexity, we can still make some observations about the data and the network. For example, the ring around the outside is unconnected to the core and represents people who tweeted with the #OccupyOakland or #OO hashtag, and were retweeted, but not by anyone in the core of the network. Also, these people did not retweet anything from anyone inside the core; at least, not in the nine days of data used in this plot. From poking around in the data I know there is a fair amount of flame (derogatory or insulting tweets lobbed into the information stream). Could some of these represent Occupy’s detractors?
We can also see that the core is extremely densely connected, but despite this there are a great many hops along links between people on the left side of the graph and those on the right (network diameter). I haven’t calculated statistics for this, but if we were to decide it that was important to know that it would be fairly easy to do so.
If we zoom in a bit we can see that the larger white blobs (my adviser calls them mushrooms) are actually clusters of users. Since they are all linked to the same node, we know that they all retweeted a single person. Thus, these mushrooms represent highly popular tweets that ‘went viral’ and effectively reached a broad audience. But some of these one-off large retweets do not appear to come from folks in the core. Could they be cases were someone was in the right place at the right time to report on an incident on interest?
I’ll zoom in one more time, and at this level we can see part of the ring structure. If you click the image (any of the three) you can enlarge it. If you do, you will see a common feature of data like these: the vast number of tweets that get retweeted are only retweeted once or twice. Very few tweets, relative to the entire set, get retweeted one hundred times or more. So some of the retweets in the ring could be flame, as I mentioned before, but could also just be an artifact of human attention dynamics.
So, art or fodder for analysis?
My own sense is that if I look at this image long enough, given what I have studied about social networks, communication theory and network structures, many many questions come to mind for me. But I also admit to just getting a kick of fiddling with R and plotting images I think are cool.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.