The Social Dynamics of the R Core Team
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Recently a few members of R Core have indicated that part of what slows down the development of R as a language is that it has become increasingly difficult over the years to achieve consensus among the core developers of the language. Inspired by these claims, I decided to look into this issue quantitatively by measuring the quantity of commits to R’s SVN repository that were made by each of the R Core developers. I wanted to know whether a small group of developers were overwhelmingly responsible for changes to R or whether all of the members of R Core had contributed equally. To follow along with what I did, you can grab the data and analysis scripts from GitHub.
First, I downloaded the R Core team’s SVN logs from http://developer.r-project.org/. I then used a simple regex to parse the SVN logs to count commits coming from each core committer.
After that, I tabulated the number of commits from each developer, pooling across the years 2003-2012 for which I had logs. You can see the results below, sorted by total commits in decreasing order:
Committer | Total Number of Commits |
---|---|
ripley | 22730 |
maechler | 3605 |
hornik | 3602 |
murdoch | 1978 |
pd | 1781 |
apache | 658 |
jmc | 599 |
luke | 576 |
urbaneks | 414 |
iacus | 382 |
murrell | 324 |
leisch | 274 |
tlumley | 153 |
rgentlem | 141 |
root | 87 |
duncan | 81 |
bates | 76 |
falcon | 45 |
deepayan | 40 |
plummer | 28 |
ligges | 24 |
martyn | 20 |
ihaka | 14 |
After that, I tried to visualize evolving trends over the years. First, I visualized the number of commits per developer per year:
And then I visualized the evenness of contributions from different developers by measuring the entropy of the distribution of commits on a yearly basis:
There seems to be some weak evidence that the community is either finding consensus more difficult and tending towards a single leader who makes final decisions or that some developers are progressively dropping out because of the difficulty of achieving consensus. There is unambiguous evidence that a single developer makes the overwhelming majority of commits to R’s SVN repo.
I leave it to others to understand what all of this means for R and for programming language communities in general.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.