Using R to Visualizing Information Flows on Wikipedia Talk Pages

Jeff Hemsley

10 years ago

[This article was first published on SoMe Lab » r-project, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Wikipedia talk pages allow editors to discuss the evolving content on related Wikipedia articles. Sometimes the topic of a page is controversial and the talk page threads can become heated with different posts invoking a wide range of values in the kinds of appeals they use in their arguments. For example, in one thread you could have someone arguing that it is morally wrong to expose people to specific content, but others may argue in favor of posting the content on the grounds that Wikipedia’s mission is to provide free access to information. But as a social scientist interested in visualizing information flows, the question is: how do you visualize the change in time of overall thread volume and posts per-thread, while also capturing threads that are rich in valued appeals?

In this case I used a stacked spline approach where the overall height gives you an idea of both the total threads and the total posts on any given day. For example,between the 2nd and 5th discussion on the talk page was heaviest, and then started to diminish. We can also see that it was the most active time in terms of the number threads that people were posting too. To get an idea of the number of posts per thread, notice the vertical distance between the spline curves. Around the 19th a large red bulge indicates that one thread received nearly all the posts for that day, and indeed probably the most nearly-simultaneous posts of all the threads we looked at.

We capture value richness by finding the number of appeals for a thread and dividing it by the number of posts. This is an imperfect measure for a number of reasons. First, you can have many different types of arguments (appeals) in one post. If we had a thread with 7 posts, only one of which contained appeals, and it contained 7 appeals, we would have a density of 1. We would think we have a “hot topic” when in fact, we may just have one person being argumentative. But when we add thread labels, we can look at this and get an idea of which threads might be most interesting to look at.

This graph was generated using the open source analysis program R with no special packages being used. Let me know if you have questions. I’ll be happy to post example code if there is interest!

To leave a comment for the author, please follow the link and comment on their blog: SoMe Lab » r-project.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.