Analyzing R-bloggers

[This article was first published on The PolStat R Feed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In the last two posts we saw how to download posts from R-bloggers, and then extract the title, author and date of each post and write that information to a csv file. Since we now have a nice data set from r-bloggers, we can start to examine the development of the site during its time span. In this post I will look at the following patterns in the data :

  1. The rate of monthly posts submitted to r-bloggers
  2. The distribution of posts and contributors
  3. The top contributors in total and tabulated by year

The graph below show the monthly count of posts submitted to r-bloggers.com:

As you can see R-bloggers.com has experienced a tremendous growth in posts,. The first years, from 2005 to the end of 2008, where fairly consistent, with an average posting rate of 6 posts per month. In 2009 we see the beginning of a dramatic rise in submitted posts, which peaks in march 2011 with 266 posts that month. To see whether this is a function of a few very active bloggers, or if we also see a similar increase in contributors, the graph below plot the number of unique contributors for every month:

Here we see that the monthly number of contributors follows closely the monthly number of posts, therefor the rise in posts is not exclusively a result of a result of a few extremely active bloggers. However as the figure below show, most authors contribute a fairly small number of posts:

The distribution is extremely skewed with a median of 6 posts, and a few authors contributing 200 or more posts.

The overall top ten contributors to r-bloggers.org are:

author count
David Smith 647
xi’an 293
Thinking inside the box 217
Tal Galili 124
klr 104
Stephen Turner 102
dirk.eddelbuettel 94
Ralph 82
romain francois 79
C 77

Breaking this down by year we can see that from 2009 there is a rise of some very active R bloggers:

2005
author count
Hadley Wickham 3
fernandohrosa 2
2006
author count
seth 6
Hadley Wickham 5
dataninja 5
Di Cook 3
Vincent Zoonekynd&amp #039;s Blog 3
fernandohrosa 2
Andrew Gelman 1
2007
author count
Mario Pineda-Krch 20
Forester 14
Egon Willighagen 5
Andrew Gelman 4
Rob J Hyndman 4
dataninja 4
Hadley Wickham 3
John Johnson 2
dan 2
seth 2
2008
author count
Yu-Sung Su 28
Michal 9
Rob J Hyndman 8
Gregor Gorjanc 6
Forester 5
Di Cook 4
John Johnson 4
Mario Pineda-Krch 4
Radford Neal 4
abiao 4
2009
author count
Thinking inside the box 63
dirk.eddelbuettel 36
Shige 30
John Myles White 28
Paolo 26
David Smith 25
Todos Logos 25
Jeromy Anglim 24
Stephen Turner 23
romain francois 23
2010
author count
David Smith 352
xi’an 152
Thinking inside the box 85
C 75
Tal Galili 74
dirk.eddelbuettel 58
Ralph 53
romain francois 41
Stephen Turner 34
Kelly 33
2011
author count
David Smith 268
xi’an 137
klr 104
Thinking inside the box 66
BMS Add-ons » BMS Blog 58
Pat 52
Scott Chamberlain 48
Stephen Turner 44
Kay Cichini 43
Tal Galili 37

From 2009 a number of authors appear in every year as some of the top contributors, and of course in 2010 David Smith and Xi’an appears, both with a massive output.

I see r-bloggers as one of the great services in the R community, and the presence of very knowledgeable and prolific contributors is a public good that we can all enjoy. So lets hope the current trend will continue into the new year!

As always the full r script to reproduce the above analysis is here:
#read the libraries
library(plyr)
library(ggplot2)
library(xtable)
#set the working direcotry to where you saved the output.csv file from the previous post
setwd("/.../")
#read the data
data <- read.csv("output.csv")
#define the date variable and create the year and month variables
data$date <- as.Date(data$date, format = "%B %d %Y")
data$year <- as.POSIXlt(data$date)$year + 1900
data$month <- as.POSIXlt(data$date)$mon + 1
#get the monthly count of posts for every year
posts <- ddply(data, c("year","month"), function(x) data.frame(count = nrow(x)))
#for easier plotting create a date variable from the year and month
dates <- paste(posts$year,posts$month,"01", sep = "-")
posts$date <- as.Date(dates, format = "%Y-%m-%d")
#plot the monthly post count
plot <- ggplot(posts, aes(x = date, y = count)) + geom_line() + theme_bw() + ylab("Post Count")
plot
#get the number of monthly contributors
contributors <- ddply(data,c("year","month"), function(x) data.frame(contributors = length(unique(x$author))))
#for easier plotting create a date variable from the year and month
dates <- paste(contributors$year,contributors$month,"01", sep = "-")
contributors$date <- as.Date(dates, format = "%Y-%m-%d")
#plot the monthly count of contributors
plot <- ggplot(contributors, aes(x = date, y = contributors)) + geom_line() + theme_bw()
plot
#get the number of posts per author
authors <- ddply(data, "author", function(x) data.frame(count = nrow(x)))
#plot the density of contributions per author
plot <- ggplot(authors, aes(x = count)) +
geom_density(fill = "red", alpha = .3) +
theme_bw() +
opts(axis.ticks = theme_blank(), axis.text.x = theme_blank())
plot
#get the ten authors with the highest post count
topten <- authors[order(authors$count, decreasing = TRUE)[1:10],]
print(xtable(topten), type = "html", include.rownames = FALSE)
#get the post of authors for every year
authorsYear<- ddply(data, c("author","year"), function(x) data.frame(count = nrow(x)))
#for every year get a table of the ten most prolific authors and print it as html
for (year in unique(authorsYear$year)){
print(year)
table <- authorsYear[authorsYear$year == year,]
table <- table[order(table$count, decreasing = TRUE)[1:10],]
print(xtable(table[,c("author","count")]), type = "html", include.rownames = FALSE)
}

To leave a comment for the author, please follow the link and comment on their blog: The PolStat R Feed.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)