Site icon R-bloggers

CRAN “Golden Oldies” and “One Hit Wonders”

[This article was first published on R Views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

If you are a regular R Views reader, then you know that every month I post my Top 40 picks for new CRAN packages. In this post, I’ll borrow some additional terminology from the Top 40 AM radio of my youth, and talk about two new categories of CRAN packages: Golden Oldies and One Hit Wonders.

Golden Oldies

On Top 40 radio, Golden Oldies are tunes that continue to get a lot of air play over decades, and are generally thought to be classics of 50’s, 60’s, and 70’s pop culture. By a Golden Oldie R package I mean an R package that has been around for a long time, has gone through several version upgrades of bug fixes and enhancements, and is thought to be an indispensable package for some part of the R Community.

For example, the nlme and lme4 R packages for fitting mixed models would surely be on the Golden Oldie list for any biostatistician who works in R. nlme has been around since 1999 and has had 139 version bumps, while lme4 was introduced in 2003 and has had 118 versions so far. The download statistics indicate that lme4 has largely replaced nlme but nlme is still getting some serious play time.

d_stats_mm <- cran_stats(c("nlme","lme4"))
ggplot(d_stats_mm, aes(end, downloads, group=package, color=package)) + geom_line() +
  ggtitle("Downloads for Mixed Model Packages") + xlab("Month")

In addition to signalling usefulness, Golden Oldieness also serves as a proxy for quality. I believe that most R users would agree that packages which have been lovingly maintained over many years and still see frequent use are most likely to contain code that you can count on. This is the kind of risk mitigation metric that is driving the R Validation Hub effort.

The following is a list of ten packages that I count as Golden Oldies. They have all been around for at least 14 years and they all get considerable play.

10 Golden Oldies
pkg date num_pub
quantreg 1999-01-11 67
nlme 1999-11-23 139
survival 2001-06-22 94
lme4 2003-06-25 118
Hmisc 2003-07-17 70
zoo 2004-02-20 64
data.table 2006-04-15 56
ggplot2 2007-06-10 41
xts 2008-01-05 37
Rcpp 2008-11-06 91

One Hit Wonders

On Top 40 radio One Hit Wonders could also be Golden Oldies. The designation just meant that the recording artists had one big hit and then could not find their way back into the Top 40 again. (Here is a One Hit Wonder favorite of mine from 1969.)

For CRAN packages, I am using One Hit Wonder to mean a package that has been on CRAN for many years but has never had a version bump: no bug fixes, no upgrades, no faults – just the opposite of “Golden Oldies”. Finding them is not so easy though. As far as I can tell, there is no easy way to access the CRAN archive metadata. However, you can hunt for One Hit Wonders by searching through the long left tail of the publication date distribution of CRAN packages.

This code picks out the publication dates for CRAN packages.

c_db <- tools::CRAN_package_db()
cran_db <- clean_CRAN_db()
package_network <- cran_db %>% build_network(perspective = "package")

pkg <- package_network$nodes$package
pub <- package_network$nodes$published

published <- na.omit((data.frame(pkg,pub)))
clean_pub <- published %>% filter(pub > as.Date(2005-01-01, format = "%Y-%m-%d", origin = "1900-01-01"))

And, here is the histogram. The vertical lines mark the first two quantiles.

ggplot(clean_pub, aes(x=pub)) + geom_histogram(bins = 193) +
  geom_vline(aes(xintercept = median(pub)),col='grey',size=1, linetype="dotted") +
  geom_vline(aes(xintercept = quantile(pub,probs = .25, type = 1)), col='grey',size=1, linetype="dotted") + 
  xlab("CRAN Package Publication Date") + ylab("Number of packages") + ggtitle("Histogram of Publication Dates")

When I recently ran this code, the oldest current publication date of the 18,850 package on CRAN is March 15, 2006.

summary(clean_pub)
##      pkg                 pub            
##  Length:18970       Min.   :2006-03-15  
##  Class :character   1st Qu.:2018-06-19  
##  Mode  :character   Median :2020-08-07  
##                     Mean   :2019-09-22  
##                     3rd Qu.:2021-08-13  
##                     Max.   :2022-02-16

The following code extracts the packages in the first quantile and sorts them. Determining if the publication date is the original package publication date, however, requires verifying that the package has never been archived.

q1_pub <- quantile(clean_pub$pub, type = 1)[2]
q1_pkgs <- clean_pub %>% filter(pub <= q1_pub) %>% arrange(pub)
q1_pkgs[1:20,]
##              pkg        pub
## 1      coxrobust 2006-03-15
## 2  BayesValidate 2006-03-30
## 3       fuzzyFDR 2007-10-16
## 4         poilog 2008-04-29
## 5        SASPECT 2008-06-23
## 6            RM2 2008-08-13
## 7           pack 2008-09-08
## 8         expert 2008-10-02
## 9            kzs 2008-10-28
## 10           ETC 2009-01-30
## 11 CreditMetrics 2009-02-01
## 12   Reliability 2009-02-01
## 13           spe 2009-02-24
## 14          mcsm 2009-04-28
## 15    SEMModComp 2009-05-05
## 16   bootStepAIC 2009-06-04
## 17      HybridMC 2009-06-08
## 18    PearsonICA 2009-06-29
## 19    crantastic 2009-08-08
## 20         km.ci 2009-08-30

Eyeballing the package documentation on CRAN revealed that there are nine One Hit Wonders among the first twenty oldest packages in the left tail. Here are their download stats.

But what can you say about these One Hit Wonders? Except for fuzzyFDR, the first nine don’t seem to be getting much action. The small download numbers, however, do not themselves signal a lack quality or irrelevance. There may indeed be some treasures among One Hit Wonders that survive on CRAN because unlike the music business, as long as a package plays and does not break any other package, the CRAN D.J.s keep it in the playlist.

So, are you the kind of person who enjoys sorting through old records: 33s, 45s, and maybe even 78s. Would you visit a shop like Rough Trade, the AI Record Shop or Stranded Records on a trip to New York City? If so, maybe you would enjoy looking through CRAN for esoteric packages like BayesValidate which supports an interesting paper on validating Bayesian models. Wouldn’t it be special to find a One Hit Wonder out there on CRAN in mint condition, needing no bug fixes or enhancements, that does something really sweet?

To leave a comment for the author, please follow the link and comment on their blog: R Views.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.