A segmented model of CRAN package growth
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
by Andrie de Vries
A few weeks ago I wrote about the growth of CRAN packages, where I demonstrated how to scrape CRAN archives to get an estimate of the number of packages over time. In this post I briefly mentioned that the Ecdat package contains a dataset, CRANpackages, with snapshots recorded by John Fox and Spencer Graves.
Here is a plot of the data they collected. The dataset contains data through 2014, so I manually added the package count as of today (8,329).
Is there a decline in the rate of growth?
In my previous post, I asked the question: “are there indications that the contribution rate is steady, accelerating or decelerating?”
This hints to analysis by John Fox where he says “The number of packages on CRAN … has grown roughly exponentially, with residuals from the exponential trend … showing a recent decline in the rate of growth” (Fox, 2009).
Segmented regression
In my previous post Using segmented regression to analyse world record running times I used segmented regression to estimate a model that is piece-wise linear.
I used the same process to fit a segmented regression line through the CRAN package data.
By default the segmented package fits a single break point through the data. The results of this analysis indicates a break point occurring some time during 2008. This is entirely consistent with the observation by John Fox that the rate of growth is slowing down.
However, note that the the segmented regression line doesn't fit the data very well during the period 2008 to 2012.
With a small amount of extra work you can fit segmented models with multiple break points. To do this, you simply have to specify initial values for the search. Here I show the results of a simple model with two break points. This model finds the first break point during 2007 and the second break point during 2011.
Conclusion
Natural systems can not maintain exponential growth forever. There are always some limits on the system that will ultimately inhibit any further growth. This is why many systems display some kind of sigmoid curve, or S curve.
Although the growth curve of CRAN packages shows signs of slowing down, it does not seem as if there is an inflexion point in the data. An inflexion point is where the curve transitions from being convex to being concave.
Thus it seems the grown of CRAN packages will appear to be exponential for quite some time in the future!
The code
As usual, here is the R code I used.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.