Site icon R-bloggers

R trends in 2015 (based on cranlogs)

[This article was first published on R – G-Forge, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
What are the current tRends? The image is CC from coco + kelly.
It is always fun to look back and reflect on the past year. Inspired by Christoph Safferling’s post on top packages from published in 2015, I decided to have my own go at the top R trends of 2015. Contrary to Safferling’s post I’ll try to also (1) look at packages from previous years that hit the big league, (2) what top R coders we have in the community, and then (2) round-up with my own 2015-R-experience. Everything in this post is based on the CRANberries reports. To harvest the information I’ve borrowed shamelessly from Safferling’s post with some modifications. He used the number of downloads as proxy for package release date, while I decided to use the release date, if that wasn’t available I scraped it off the CRAN servers. The script now also retrieves package author(s) and description (see code below for details).
?View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
library(rvest)
library(dplyr)
# devtools::install_github("hadley/multidplyr")
library(multidplyr)
library(magrittr)
library(lubridate)
 
getCranberriesElmnt <- function(txt, elmnt_name){
  desc <- grep(sprintf("^%s:", elmnt_name), txt)
  if (length(desc) == 1){
    txt <- txt[desc:length(txt)]
    end <- grep("^[A-Za-z/@]{2,}:", txt[-1])
    if (length(end) == 0)
      end <- length(txt)
    else
      end <- end[1]
 
    desc <-
      txt[1:end] %>% 
      gsub(sprintf("^%s: (.+)", elmnt_name),
           "\1", .) %>% 
      paste(collapse = " ") %>% 
      gsub("[ ]{2,}", " ", .) %>% 
      gsub(" , ", ", ", .)
  }else if (length(desc) == 0){
    desc <- paste("No", tolower(elmnt_name))
  }else{
    stop("Could not find ", elmnt_name, " in text: n",
         paste(txt, collapse = "n"))
  }
  return(desc)
}
 
convertCharset <- function(txt){
  if (grepl("Windows", Sys.info()["sysname"]))
    txt <- iconv(txt, from = "UTF-8", to = "cp1252")
  return(txt)
}
 
getAuthor <- function(txt, package){
  author <- getCranberriesElmnt(txt, "Author")
  if (grepl("No author|See AUTHORS file", author)){
    author <- getCranberriesElmnt(txt, "Maintainer")
  }
 
  if (grepl("(No m|M)aintainer|(No a|A)uthor|^See AUTHORS file", author) || 
      is.null(author) ||
      nchar(author)  <= 2){
    cran_txt <- read_html(sprintf("http://cran.r-project.org/web/packages/%s/index.html",
                                  package))
    author <- cran_txt %>% 
      html_nodes("tr") %>% 
      html_text %>% 
      convertCharset %>% 
      gsub("(^[ tn]+|[ tn]+$)", "", .) %>% 
      .[grep("^Author", .)] %>% 
      gsub(".*n", "", .)
 
    # If not found then the package has probably been
    # removed from the repository
    if (length(author) == 1)
      author <- author
    else
      author <- "No author"
  }
 
  # Remove stuff such as:
  # [cre, auth]
  # (worked on the...)
  # <my@email.com>
  # "John Doe"
  author %<>% 
    gsub("^Author: (.+)", 
         "\1", .) %>% 
    gsub("[ ]*\[[^]]{3,}\][ ]*", " ", .) %>% 
    gsub("\([^)]+\)", " ", .) %>% 
    gsub("([ ]*<[^>]+>)", " ", .) %>% 
    gsub("[ ]*\[[^]]{3,}\][ ]*", " ", .) %>% 
    gsub("[ ]{2,}", " ", .) %>% 
    gsub("(^[ '"]+|[ '"]+$)", "", .) %>% 
    gsub(" , ", ", ", .)
  return(author)
}
 
getDate <- function(txt, package){
  date <- 
    grep("^Date/Publication", txt)
  if (length(date) == 1){
    date <- txt[date] %>% 
      gsub("Date/Publication: ([0-9]{4,4}-[0-9]{2,2}-[0-9]{2,2}).*",
           "\1", .)
  }else{
    cran_txt <- read_html(sprintf("http://cran.r-project.org/web/packages/%s/index.html",
                                  package))
    date <- 
      cran_txt %>% 
      html_nodes("tr") %>% 
      html_text %>% 
      convertCharset %>% 
      gsub("(^[ tn]+|[ tn]+$)", "", .) %>% 
      .[grep("^Published", .)] %>% 
      gsub(".*n", "", .)
 
 
    # The main page doesn't contain the original date if 
    # new packages have been submitted, we therefore need
    # to check first entry in the archives
    if(cran_txt %>% 
       html_nodes("tr") %>% 
       html_text %>% 
       gsub("(^[ tn]+|[ tn]+$)", "", .) %>% 
       grepl("^Old.{1,4}sources", .) %>% 
       any){
      archive_txt <- read_html(sprintf("http://cran.r-project.org/src/contrib/Archive/%s/",
                                       package))
      pkg_date <- 
        archive_txt %>% 
        html_nodes("tr") %>% 
        lapply(function(x) {
          nodes <- html_nodes(x, "td")
          if (length(nodes) == 5){
            return(nodes[3] %>% 
                     html_text %>% 
                     as.Date(format = "%d-%b-%Y"))
          }
        }) %>% 
        .[sapply(., length) > 0] %>% 
        .[!sapply(., is.na)] %>% 
        head(1)
 
      if (length(pkg_date) == 1)
        date <- pkg_date[[1]]
    }
  }
  date <- tryCatch({
    as.Date(date)
  }, error = function(e){
    "Date missing"
  })
  return(date)
}
 
getNewPkgStats <- function(published_in){
  # The parallel is only for making cranlogs requests
  # we can therefore have more cores than actual cores
  # as this isn't processor intensive while there is
  # considerable wait for each http-request
  cl <- create_cluster(parallel::detectCores() * 4)
  parallel::clusterEvalQ(cl, {
    library(cranlogs)
  })
  set_default_cluster(cl)
  on.exit(stop_cluster())
 
  berries <- read_html(paste0("http://dirk.eddelbuettel.com/cranberries/", published_in, "/"))
  pkgs <- 
    # Select the divs of the package class
    html_nodes(berries, ".package") %>% 
    # Extract the text
    html_text %>% 
    # Split the lines
    strsplit("[n]+") %>% 
    # Now clean the lines
    lapply(.,
           function(pkg_txt) {
             pkg_txt[sapply(pkg_txt, function(x) { nchar(gsub("^[ t]+", "", x)) > 0}, 
                            USE.NAMES = FALSE)] %>% 
               gsub("^[ t]+", "", .) 
           })
 
  # Now we select the new packages
  new_packages <- 
    pkgs %>% 
    # The first line is key as it contains the text "New package"
    sapply(., function(x) x[1], USE.NAMES = FALSE) %>% 
    grep("^New package", .) %>% 
    pkgs[.] %>% 
    # Now we extract the package name and the date that it was published
    # and merge everything into one table
    lapply(function(txt){
      txt <- convertCharset(txt)
      ret <- data.frame(
        name = gsub("^New package ([^ ]+) with initial .*", 
                     "\1", txt[1]),
        stringsAsFactors = FALSE
      )
 
      ret$desc <- getCranberriesElmnt(txt, "Description")
      ret$author <- getAuthor(txt, ret$name)
      ret$date <- getDate(txt, ret$name)
 
      return(ret)
    }) %>% 
    rbind_all %>% 
    # Get the download data in parallel
    partition(name) %>% 
    do({
      down <- cran_downloads(.$name[1], 
                             from = max(as.Date("2015-01-01"), .$date[1]), 
                             to = "2015-12-31")$count 
      cbind(.[1,],
            data.frame(sum = sum(down), 
                       avg = mean(down))
      )
    }) %>% 
    collect %>% 
    ungroup %>% 
    arrange(desc(avg))
 
  return(new_packages)
}
 
pkg_list <- 
  lapply(2010:2015,
         getNewPkgStats)
 
pkgs <- 
  rbind_all(pkg_list) %>% 
  mutate(time = as.numeric(as.Date("2016-01-01") - date),
         year = format(date, "%Y"))

Downloads and time on CRAN

The longer a package has been on CRAN the more downloaded it gets. We can illustrate this using simple linear regression, slightly surprising is that this behaves mostly linear:
?View Code RSPLUS
1
2
3
4
5
6
7
8
pkgs %<>% 
  mutate(time_yrs = time/365.25)
fit <- lm(avg ~ time_yrs, data = pkgs)
 
# Test for non-linearity
library(splines)
anova(fit,
      update(fit, .~.-time_yrs+ns(time_yrs, 2)))
Analysis of Variance Table

Model 1: avg ~ time
Model 2: avg ~ ns(time, 2)
  Res.Df       RSS Df Sum of Sq      F Pr(>F)
1   7348 189661922                           
2   7347 189656567  1    5355.1 0.2075 0.6488
Where the number of average downloads increases with about 5 downloads per year. It can easily be argued that the average number of downloads isn’t that interesting since the data is skewed, we can therefore also look at the upper quantiles using quantile regression:
?View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
library(quantreg)
library(htmlTable)
lapply(c(.5, .75, .95, .99),
       function(tau){
         rq_fit <- rq(avg ~ time_yrs, data = pkgs, tau = tau)
         rq_sum <- summary(rq_fit)
         c(Estimate = txtRound(rq_sum$coefficients[2, 1], 1), 
           `95 % CI` = txtRound(rq_sum$coefficients[2, 1] + 
                                        c(1,-1) * rq_sum$coefficients[2, 2], 1) %>% 
             paste(collapse = " to "))
       }) %>% 
  do.call(rbind, .) %>% 
  htmlTable(rnames = c("Median",
                       "Upper quartile",
                       "Top 5%",
                       "Top 1%"))
Estimate 95 % CI
Median 0.6 0.6 to 0.6
Upper quartile 1.2 1.2 to 1.1
Top 5% 9.7 11.9 to 7.6
Top 1% 182.5 228.2 to 136.9
The above table conveys a slightly more interesting picture. Most packages don’t get that much attention while the top 1% truly reach the masses.

Top downloaded packages

In order to investigate what packages R users have been using during 2015 I’ve looked at all new packages since the turn of the decade. Since each year of CRAN-presence increases the download rates, I’ve split the table by the package release dates. The results are available for browsing below (yes – it is the new brand interactive htmlTable that allows you to collapse cells – note it may not work if you are reading this on R-bloggers and the link is lost under certain circumstances).
Downloads
Name Author Total Average/day Description
Top 10 packages published in 2015
xml2 Hadley Wickham, Jeroen Ooms, RStudio, R Foundation 348,222 1635 Work with XML files … Work with XML files using a simple, consistent interface. Built on top of the ‘libxml2’ C library.
rversions Gabor Csardi 386,996 1524 Query the main R SVN… Query the main R SVN repository to find the versions r-release and r-oldrel refer to, and also all previous R versions and their release dates.
git2r Stefan Widgren 411,709 1303 Interface to the lib… Interface to the libgit2 library, which is a pure C implementation of the Git core methods. Provides access to Git repositories to extract data and running some basic git commands.
praise Gabor Csardi, Sindre Sorhus 96,187 673 Build friendly R pac… Build friendly R packages that praise their users if they have done something good, or they just need it to feel better.
readxl David Hoerl 99,386 379 Import excel files i… Import excel files into R. Supports ‘.xls’ via the embedded ‘libxls’ C library (http://sourceforge.net/projects/libxls/) and ‘.xlsx’ via the embedded ‘RapidXML’ C++ library (http://rapidxml.sourceforge.net). Works on Windows, Mac and Linux without external dependencies.
readr Hadley Wickham, Romain Francois, R Core Team, RStudio 90,022 337 Read flat/tabular te… Read flat/tabular text files from disk.
DiagrammeR Richard Iannone 84,259 236 Create diagrams and … Create diagrams and flowcharts using R.
visNetwork Almende B.V. (vis.js library in htmlwidgets/lib, 41,185 233 Provides an R interf… Provides an R interface to the ‘vis.js’ JavaScript charting library. It allows an interactive visualization of networks.
plotly Carson Sievert, Chris Parmer, Toby Hocking, Scott Chamberlain, Karthik Ram, Marianne Corvellec, Pedro Despouy 9,745 217 Easily translate ggp… Easily translate ggplot2 graphs to an interactive web-based version and/or create custom web-based visualizations directly from R. Once uploaded to a plotly account, plotly graphs (and the data behind them) can be viewed and modified in a web browser.
DT Yihui Xie, Joe Cheng, jQuery contributors, SpryMedia Limited, Brian Reavis, Leon Gersen, Bartek Szopka, RStudio Inc 24,806 120 Data objects in R ca… Data objects in R can be rendered as HTML tables using the JavaScript library ‘DataTables’ (typically via R Markdown or Shiny). The ‘DataTables’ library has been included in this R package. The package name ‘DT’ is an abbreviation of ‘DataTables’.
Top 10 packages published in 2014
stringi Marek Gagolewski and Bartek Tartanus ; IBM and other contributors ; Unicode, Inc. 1,316,900 3608 stringi allows for v… stringi allows for very fast, correct, consistent, and convenient character string/text processing in each locale and any native encoding. Thanks to the use of the ICU library, the package provides R users with a platform-independent functionality known to Java, Perl, Python, PHP and Ruby programmers.
magrittr Stefan Milton Bache and Hadley Wickham 1,245,662 3413 Provides a mechanism… Provides a mechanism for chaining commands with a new forward-pipe operator. Ceci n’est pas un pipe.
mime Yihui Xie 1,038,591 2845 This package guesses… This package guesses the MIME type from a filename extension using the data derived from /etc/mime.types in UNIX-type systems.
R6 Winston Chang 920,147 2521 The R6 package allow… The R6 package allows the creation of classes with reference semantics, similar to R’s built-in reference classes. Compared to reference classes, R6 classes are simpler and lighter-weight, and they are not built on S4 classes so they do not require the methods package. These classes allow public and private members, and they support inheritance.
dplyr Hadley Wickham, Romain Francois 778,311 2132 A fast, consistent t… A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
manipulate JJ Allaire, RStudio 626,191 1716 Interactive plotting… Interactive plotting functions for use within RStudio. The manipulate function accepts a plotting expression and a set of controls (e.g. slider, picker, checkbox, or button) which are used to dynamically change values within the expression. When a value is changed using its corresponding control the expression is automatically re-executed and the plot is redrawn.
htmltools RStudio, Inc. 619,171 1696 Tools for HTML gener… Tools for HTML generation and output
curl Jeroen Ooms 599,704 1643 The curl() function … The curl() function provides a drop-in replacement for base url() with better performance and support for http 2.0, ssl (https, ftps), gzip, deflate and other libcurl goodies. This interface is implemented using the RConnection API in order to support incremental processing of both binary and text streams. If you are looking for a more user friendly http client, try the RCurl or httr packages instead.
lazyeval Hadley Wickham, RStudio 572,546 1569 A disciplined approa… A disciplined approach to non-standard evaluation.
rstudioapi RStudio 515,665 1413 This package provide… This package provides functions to make it easy to access the RStudio API when available, and provide informative error messages when not.
Top 10 packages published in 2013
jsonlite Jeroen Ooms, Duncan Temple Lang 906,421 2483 This package is a fo… This package is a fork of the RJSONIO package by Duncan Temple Lang. It builds on the parser from RJSONIO, but implements a different mapping between R objects and JSON strings. The C code in this package is mostly from Temple Lang, the R code has been rewritten from scratch. In addition to drop-in replacements for fromJSON and toJSON, the package has functions to serialize objects. Furthermore, the package contains a lot of unit tests to make sure that all edge cases are encoded and decoded consistently for use with dynamic data in systems and applications.
BH John W. Emerson, Michael J. Kane, Dirk Eddelbuettel, JJ Allaire, and Romain Francois 691,280 1894 Boost provides free … Boost provides free peer-reviewed portable C++ source libraries. A large part of Boost is provided as C++ template code which is resolved entirely at compile-time without linking. This package aims to provide the most useful subset of Boost libraries for template use among CRAN package. By placing these libraries in this package, we offer a more efficient distribution system for CRAN as replication of this code in the sources of other packages is avoided.
highr Yihui Xie and Yixuan Qiu 641,052 1756 This package provide… This package provides syntax highlighting for R source code. Currently it supports LaTeX and HTML output. Source code of other languages can be supported via Andre Simon’s Highlight package.
assertthat Hadley Wickham 527,961 1446 assertthat is an ext… assertthat is an extension to stopifnot() that makes it easy to declare the pre and post conditions that you code should satisfy, while also producing friendly error messages so that your users know what they’ve done wrong.
httpuv RStudio, Inc. 310,699 851 httpuv provides low-… httpuv provides low-level socket and protocol support for handling HTTP and WebSocket requests directly from within R. It is primarily intended as a building block for other packages, rather than making it particularly easy to create complete web applications using httpuv alone. httpuv is built on top of the libuv and http-parser C libraries, both of which were developed by Joyent, Inc. (See LICENSE file for libuv and http-parser license information.)
NLP Kurt Hornik 270,682 742 Basic classes and me… Basic classes and methods for Natural Language Processing.
TH.data Torsten Hothorn 242,060 663 Contains data sets u… Contains data sets used on other packages I maintain.
NMF Renaud Gaujoux, Cathal Seoighe 228,807 627 This package provide… This package provides a framework to perform Non-negative Matrix Factorization (NMF). It implements a set of already published algorithms and seeding methods, and provides a framework to test, develop and plug new/custom algorithms. Most of the built-in algorithms have been optimized in C++, and the main interface function provides an easy way of performing parallel computations on multicore machines.
stringdist Mark van der Loo 123,138 337 Implements the Hammi… Implements the Hamming distance and weighted versions of the Levenshtein, restricted Damerau-Levenshtein (optimal string alignment), and Damerau-Levenshtein distance.
SnowballC Milan Bouchet-Valat 104,411 286 An R interface to th… An R interface to the C libstemmer library that implements Porter’s word stemming algorithm for collapsing words to a common root to aid comparison of vocabulary. Currently supported languages are Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish and Turkish.
Top 10 packages published in 2012
gtable Hadley Wickham 1,091,440 2990 Tools to make it eas… Tools to make it easier to work with “tables” of grobs.
knitr Yihui Xie 792,876 2172 This package provide… This package provides a general-purpose tool for dynamic report generation in R, which can be used to deal with any type of (plain text) files, including Sweave and HTML. The patterns of code chunks and inline R expressions can be customized. R code is evaluated as if it were copied and pasted in an R terminal thanks to the evaluate package (e.g. we do not need to explicitly print() plots from ggplot2 or lattice). R code can be reformatted by the formatR package so that long lines are automatically wrapped, with indent and spaces being added, and comments being preserved. A simple caching mechanism is provided to cache results from computations for the first time and the computations will be skipped the next time. Almost all common graphics devices, including those in base R and add-on packages like Cairo, cairoDevice and tikzDevice, are built-in with this package and it is straightforward to switch between devices without writing any special functions. The width and height as well as alignment of plots in the output document can be specified in chunk options (the size of plots for graphics devices is still supported as usual). Multiple plots can be recorded in a single code chunk, and it is also allowed to rearrange plots to the end of a chunk or just keep the last plot. Warnings, messages and errors are written in the output document by default (can be turned off). Currently LaTeX, HTML and Markdown are supported, and other output formats can be supported by hook functions. The large collection of hooks in this package makes it possible for the user to control almost everything in the R code input and output. Hooks can be used either to format the output or to run a specified R code fragment before or after a code chunk. Most features are borrowed or inspired by Sweave, cacheSweave, pgfSweave, brew and decumar.
httr Hadley Wickham 785,568 2152 Provides useful tool… Provides useful tools for working with HTTP connections. Is a simplified wrapper built on top of RCurl. It is much much less configurable but because it only attempts to encompass the most common operations it is also much much simpler.
markdown JJ Allaire, Jeffrey Horner, Vicent Marti, and Natacha Porte 636,888 1745 Markdown is a plain-… Markdown is a plain-text formatting syntax that can be converted to XHTML or other formats. This package provides R bindings to the Sundown markdown rendering library.
Matrix Douglas Bates and Martin Maechler 470,468 1289 Classes and methods … Classes and methods for dense and sparse matrices and operations on them using Lapack and SuiteSparse.
shiny RStudio, Inc. 427,995 1173 Shiny makes it incre… Shiny makes it incredibly easy to build interactive web applications with R. Automatic “reactive” binding between inputs and outputs and extensive pre-built widgets make it possible to build beautiful, responsive, and powerful applications with minimal effort.
lattice Deepayan Sarkar 414,716 1136 Lattice is a powerfu… Lattice is a powerful and elegant high-level data visualization system, with an emphasis on multivariate data, that is sufficient for typical graphics needs, and is also flexible enough to handle most nonstandard requirements. See ?Lattice for an introduction.
pkgmaker Renaud Gaujoux 225,796 619 This package provide… This package provides some low-level utilities to use for package development. It currently provides managers for multiple package specific options and registries, vignette, unit test and bibtex related utilities. It serves as a base package for packages like NMF, RcppOctave, doRNG, and as an incubator package for other general purposes utilities, that will eventually be packaged separately. It is still under heavy development and changes in the interface(s) are more than likely to happen.
rngtools Renaud Gaujoux 225,125 617 This package contain… This package contains a set of functions for working with Random Number Generators (RNGs). In particular, it defines a generic S4 framework for getting/setting the current RNG, or RNG data that are embedded into objects for reproducibility. Notably, convenient default methods greatly facilitate the way current RNG settings can be changed.
base64enc Simon Urbanek 223,120 611 This package provide… This package provides tools for handling base64 encoding. It is more flexible than the orphaned based64 package.
Top 10 packages published in 2011
scales Hadley Wickham 1,305,000 3575 Scales map data to a… Scales map data to aesthetics, and provide methods for automatically determining breaks and labels for axes and legends.
devtools Hadley Wickham 738,724 2024 Collection of packag… Collection of package development tools
RcppEigen Douglas Bates, Romain Francois and Dirk Eddelbuettel 634,224 1738 R and Eigen integrat… R and Eigen integration using Rcpp. Eigen is a C++ linear template library for linear algebra: matrices, vectors, numerical solvers and related algorithms. It supports dense and sparse matrices on integer, floating point and complex numbers. The performance on many algorithms is comparable with some of the best implementations based on Lapack and level-3 BLAS. The RcppEigen package includes the header files from the Eigen C++ template library (currently version 3.0.1). Thus users do not need to install Eigen itself in order to use RcppEigen. Eigen is licensed under the GNU LGPL version 3 or later, and also under the GNU GPL version 2 or later. RcppEigen (the Rcpp bindings/bridge to Eigen) is licensed under the GNU GPL version 2 or later, as is the rest of Rcpp.
fpp Rob J Hyndman 583,505 1599 All data sets requir… All data sets required for the workshop in Kandersteg, Switzerland, 20-22 June 2011.
nloptr Jelmer Ypma 583,230 1598 nloptr is an R inter… nloptr is an R interface to NLopt. NLopt is a free/open-source library for nonlinear optimization, providing a common interface for a number of different free optimization routines available online as well as original implementations of various other algorithms. See
pbkrtest Ulrich Halekoh Søren Højsgaard 536,409 1470 Test in linear mixed… Test in linear mixed effects models based on parametric bootstrap approaches and Kenward-Roger modification of F-tests
roxygen2 Hadley Wickham, Peter Danenberg, Manuel Eugster 478,765 1312 A Doxygen-like in-so… A Doxygen-like in-source documentation system for Rd, collation, and NAMESPACE.
whisker Edwin de Jonge 413,068 1132 logicless templating… logicless templating, reuse templates in many programming languages including R
doParallel Revolution Analytics 299,717 821 Provides a parallel … Provides a parallel backend for the %dopar% function using the parallel package.
abind Tony Plate and Richard Heiberger 255,151 699 Combine multi-dimens… Combine multi-dimensional arrays into a single array. This is a generalization of cbind and rbind. Works with vectors, matrices, and higher-dimensional arrays. Also provides functions adrop, asub, and afill for manipulating, extracting and replacing data in arrays.
Top 10 packages published in 2010
reshape2 Hadley Wickham 1,395,099 3822 Reshape lets you fle… Reshape lets you flexibly restructure and aggregate data using just two functions: melt and cast.
labeling Justin Talbot 1,104,986 3027 Provides a range of … Provides a range of axis labeling algorithms
evaluate Hadley Wickham 862,082 2362 Parsing and evaluati… Parsing and evaluation tools that make it easy to recreate the command line behaviour of R.
formatR Yihui Xie 640,386 1754 This package provide… This package provides a GUI (using gWidgets) to format R source code. Spaces and indent will be added to the code automatically, so that R code will be more readable and tidy.
minqa Katharine M. Mullen, John C. Nash, Ravi Varadhan 600,527 1645 Derivative-free opti… Derivative-free optimization by quadratic approximation based on an interface to Fortran implementations by M. J. D. Powell
gridExtra Baptiste Auguie 581,140 1592 misc. functions
memoise Hadley Wickham 552,383 1513 Cache the results of… Cache the results of a function so that when you call it again with the same arguments it returns the pre-computed value.
RJSONIO Duncan Temple Lang 414,373 1135 This is a package th… This is a package that allows conversion to and from data in Javascript object notation (JSON) format. This allows R objects to be inserted into Javascript/ECMAScript/ActionScript code and allows R programmers to read and convert JSON content to R objects. This is an alternative to rjson package. That version is too slow for large data and not extensible, but a very useful prototype. This package uses methods, vectorized operations and C code and callbacks to R functions for deserializing JSON objects to R. In the future, we will implement the deserialization in C. There are some routines that can be used now for particular array types.
RcppArmadillo Romain Francois and Dirk Eddelbuettel 410,368 1124 R and Armadillo inte… R and Armadillo integration using Rcpp Armadillo is a C++ linear algebra library aiming towards a good balance between speed and ease of use. Integer, floating point and complex numbers are supported, as well as a subset of trigonometric and statistics functions. Various matrix decompositions are provided through optional integration with LAPACK and ATLAS libraries. A delayed evaluation approach is employed (during compile time) to combine several operations into one and reduce (or eliminate) the need for temporaries. This is accomplished through recursive templates and template meta-programming. This library is useful if C++ has been decided as the language of choice (due to speed and/or integration capabilities), rather than another language. This Armadillo / C integration provides a nice illustration of the capabilities of the Rcpp package for seamless R and C++ integration/
xlsx Adrian A. Dragulescu 401,991 1101 Provide R functions … Provide R functions to read/write/format Excel 2007 (xlsx) file formats.
Just as Safferling et. al. noted there is a dominance of technical packages. This is little surprising since the majority of work is with data munging. Among these technical packages there are quite a few that are used for developing other packages, e.g. roxygen2, pkgmaker, devtools, and more.

R-star authors

Just for fun I decided to look at who has the most downloads. By splitting multi-authors into several and also splitting their downloads we can find that in 2015 the top R-coders where:
?View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
top_coders <- list(
  "2015" = 
    pkgs %>% 
    filter(format(date, "%Y") == 2015) %>% 
    partition(author) %>% 
    do({
      authors <- strsplit(.$author, "[ ]*([,;]| and )[ ]*")[[1]]
      authors <- authors[!grepl("^[ ]*(Inc|PhD|Dr|Lab).*[ ]*$", authors)]
      if (length(authors) >= 1){
        # If multiple authors the statistic is split among
        # them but with an added 20% for the extra collaboration
        # effort that a multi-author envorionment calls for
        .$sum <- round(.$sum/length(authors)*1.2)
        .$avg <- .$avg/length(authors)*1.2
        ret <- .
        ret$author <- authors[1]
        for (m in authors[-1]){
          tmp <- .
          tmp$author <- m
          ret <- rbind(ret, tmp)
        }
        return(ret)
      }else{
        return(.)
      }
    }) %>% 
    collect() %>% 
    group_by(author) %>% 
    summarise(download_ave = round(sum(avg)),
              no_packages = n(),
              packages = paste(name, collapse = ", ")) %>% 
    select(author, download_ave, no_packages, packages) %>% 
    collect() %>% 
    arrange(desc(download_ave)) %>% 
    head(10),
  "all" =
    pkgs %>% 
    partition(author) %>% 
    do({
      if (grepl("Jeroen Ooms", .$author))
        browser()
      authors <- strsplit(.$author, "[ ]*([,;]| and )[ ]*")[[1]]
      authors <- authors[!grepl("^[ ]*(Inc|PhD|Dr|Lab).*[ ]*$", authors)]
      if (length(authors) >= 1){
        # If multiple authors the statistic is split among
        # them but with an added 20% for the extra collaboration
        # effort that a multi-author envorionment calls for
        .$sum <- round(.$sum/length(authors)*1.2)
        .$avg <- .$avg/length(authors)*1.2
        ret <- .
        ret$author <- authors[1]
        for (m in authors[-1]){
          tmp <- .
          tmp$author <- m
          ret <- rbind(ret, tmp)
        }
        return(ret)
      }else{
        return(.)
      }
    }) %>% 
    collect() %>% 
    group_by(author) %>% 
    summarise(download_ave = round(sum(avg)),
              no_packages = n(),
              packages = paste(name, collapse = ", ")) %>% 
    select(author, download_ave, no_packages, packages) %>% 
    collect() %>% 
    arrange(desc(download_ave)) %>% 
    head(30))
 
interactiveTable(
  do.call(rbind, top_coders) %>% 
    mutate(download_ave = txtInt(download_ave)),
  align = "lrr",
  header = c("Coder", "Total ave. downloads per day", "No. of packages", "Packages"),
  tspanner = c("Top coders 2015",
               "Top coders 2010-2015"),
  n.tspanner = sapply(top_coders, nrow),
  minimized.columns = 4, 
  rnames = FALSE, 
  col.rgroup = c("white", "#F0F0FF"))
Coder Total ave. downloads No. of packages Packages
Top coders 2015
Gabor Csardi 2,312 11 sankey, franc, rvers… sankey, franc, rversions, whoami, cranlogs, progress, simplegraph, pkgconfig, keypress, praise, clisymbols
Stefan Widgren 1,563 1 git2r
RStudio 781 16 shinydashboard, with… shinydashboard, withr, shinybootstrap2, d3heatmap, readr, haven, xml2, purrr, ggplot2movies, bigrquery, leaflet, qrage, analogsea, svglite, gdtools, shinythemes
Hadley Wickham 695 12 withr, cellranger, c… withr, cellranger, cowplot, readr, xml2, purrr, ggplot2movies, bigrquery, leaflet, analogsea, svglite, gdtools
Jeroen Ooms 541 10 rjade, js, sodium, w… rjade, js, sodium, webp, brotli, bcrypt, x.ent, xml2, minimist, gdtools
Richard Cotton 501 22 assertive.base, asse… assertive.base, assertive.types, assertive.properties, assertive.strings, assertive.numbers, assertive.reflection, assertive.files, assertive.data, assertive.data.uk, assertive.matrices, assertive.sets, assertive.models, assertive.data.us, assertive.datetimes, assertive.code, rebus.datetimes, rebus.base, rebus.numbers, rebus.unicode, rebus, runittotestthat, pathological
R Foundation 490 1 xml2
David Hoerl 455 1 readxl
Sindre Sorhus 409 2 praise, clisymbols
Richard Iannone 294 2 DiagrammeR, stationa… DiagrammeR, stationaRy
Top coders 2010-2015
Hadley Wickham 32,115 55 swirl, lazyeval, ggp… swirl, lazyeval, ggplot2movies, bigrquery, gdtools, Rd2roxygen, DescribeDisplay, DescribeDisplay, svglite, cowplot, cellranger, leaflet, pryr, dplyr, roxygen2, readr, plumbr, ggsubplot, ggsubplot, withr, analogsea, tourr, purrr, rmarkdown, broom, lubridate, magrittr, itertools, reshape2, evaluate, memoise, sinartra, scales, devtools, nullabor, hof, productplots, gtable, httr, nullabor, assertthat, hflights, tidyr, rvest, nycflights13, babynames, clusterfly, nasaweather, fueleconomy, ggmap, namespace, rappdirs, HistData, helpr, xml2
Yihui Xie 9,739 18 DT, Rd2roxygen, high… DT, Rd2roxygen, highr, tikzDevice, leaflet, rhandsontable, htmlwidgets, fun, rmarkdown, R2SWF, R2SWF, formatR, iBUGS, MSG, knitr, testit, servr, mime
RStudio 9,123 25 shinydashboard, lazy… shinydashboard, lazyeval, ggplot2movies, bigrquery, gdtools, svglite, manipulate, rstudioapi, haven, leaflet, readr, scrypt, withr, shiny, httpuv, htmltools, ggvis, analogsea, purrr, d3heatmap, shinythemes, shinybootstrap2, rappdirs, qrage, xml2
Jeroen Ooms 4,221 25 JJcorr, gdtools, bro… JJcorr, gdtools, brotli, interactivity, x.ent, Mobilize, Ohmage, opencpu.encode, RAppArmor, opencpu, curl, V8, openssl, Mobilize, Ohmage, RPublica, webutils, js, sodium, webp, bcrypt, minimist, jsonlite, xml2, rjade
Justin Talbot 3,633 1 labeling
Winston Chang 3,531 17 shinydashboard, … shinydashboard, cm, cm, Rttf2pt1, withr, extra, extra, analogsea, shinythemes, downloader, extradb, gcookbook, bisectr, R6, namespace, shinybootstrap2, Rttf2pt1
Gabor Csardi 3,437 26 praise, clisymbols, … praise, clisymbols, franc, sankey, parsedate, sand, rappdirs, igraphdata, igraph0, isa2, igraph0, crayon, prettyunits, spark, falsy, disposables, dotenv, pingr, spareserver, rversions, whoami, cranlogs, progress, simplegraph, pkgconfig, keypress
Romain Francois 2,934 20 int64, LSD, RcppExam… int64, LSD, RcppExamples, RcppClassicExamples, dplyr, readr, mlxR, RcppBDT, RcppClassic, highlight, parser, highlight, RcppEigen, RcppArmadillo, RProtoBuf, RcppGSL, base64, Rcpp11, dendextendRcpp, RcppParallel
Duncan Temple Lang 2,854 6 RMendeley, jsonlite,… RMendeley, jsonlite, RJSONIO, Rstem, XMLSchema, SSOAP
Adrian A. Dragulescu 2,456 2 xlsx, xlsxjars
JJ Allaire 2,453 7 manipulate, htmlwidg… manipulate, htmlwidgets, packrat, rmarkdown, markdown, BH, RcppParallel
Simon Urbanek 2,369 15 png, fastmatch, jpeg… png, fastmatch, jpeg, FastRWeb, OpenCL, base64enc, PKI, tiff, RSclient, RCassandra, fasttime, uuid, iotools, RSQLServer, emdist
Dirk Eddelbuettel 2,094 33 Rblpapi, RcppSMC, RA… Rblpapi, RcppSMC, RApiSerialize, RcppExamples, RcppClassicExamples, RcppBDT, RcppClassic, gcbd, RVowpalWabbit, RcppCNPy, RcppZiggurat, RcppXts, pkgKitten, RcppRedis, RPushbullet, rfoaas, RcppAnnoy, sanitizers, drat, RcppStreams, littler, RcppCCTZ, RcppAPT, RcppTOML, gtrendsR, RcppEigen, BH, RcppArmadillo, RProtoBuf, RcppGSL, FinancialInstrument, dendextendRcpp, lbfgs
Stefan Milton Bache 2,069 3 import, blatr, magri… import, blatr, magrittr
Douglas Bates 1,966 5 PKPDmodels, RcppEige… PKPDmodels, RcppEigen, MatrixModels, Matrix, pedigreemm
Renaud Gaujoux 1,962 6 NMF, doRNG, pkgmaker… NMF, doRNG, pkgmaker, rngtools, RcppOctave, doRNG
Jelmer Ypma 1,933 2 nloptr, SparseGrid
Rob J Hyndman 1,933 3 hts, fpp, demography
Baptiste Auguie 1,924 2 gridExtra, dielectri… gridExtra, dielectric
Ulrich Halekoh Søren Højsgaard 1,764 1 pbkrtest
Martin Maechler 1,682 11 DescTools, stabledis… DescTools, stabledist, DEoptimR, expm, Bessel, MatrixModels, Matrix, nacopula, simsalapar, simsalapar, GLDEX
Mirai Solutions GmbH 1,603 3 XLConnect, XLConnect… XLConnect, XLConnectJars, XLConnectJars
Stefan Widgren 1,563 1 git2r
Edwin de Jonge 1,513 10 tabplot, tabplotGTK,… tabplot, tabplotGTK, whisker, ffbase, editrules, docopt, chunked, tabplotd3, validate, deducorrect
Kurt Hornik 1,476 12 movMF, ROI, qrmtools… movMF, ROI, qrmtools, RKEAjars, RWekajars, Unicode, NLP, openNLPdata, Rpoppler, NLPutils, W3CMarkupValidator, cclust
Deepayan Sarkar 1,369 4 qtbase, qtpaint, lat… qtbase, qtpaint, lattice, qtutils
Tyler Rinker 1,203 9 cowsay, wakefield, q… cowsay, wakefield, qdapTools, pacman, qdapRegex, qdap, qdapDictionaries, reports, regexr
Yixuan Qiu 1,131 12 gdtools, svglite, hi… gdtools, svglite, highr, fun, rARPACK, showtext, R2SWF, R2SWF, rationalfun, recosystem, syss, showtextdb
Revolution Analytics 1,011 4 doParallel, doSMP, r… doParallel, doSMP, revoIPC, checkpoint
Torsten Hothorn 948 7 MVA, HSAUR3, TH.data… MVA, HSAUR3, TH.data, partykit, hgam, MUCflights, stabs
It is worth mentioning that two of the top coders are companies, RStudio and Revolution Analytics. While I like the fact that R is free and open-source, I doubt that the community would have grown as quickly as it has without these companies. It is also symptomatic of 2015 that companies are taking R into account, it will be interesting what the R Consortium will bring to the community. I think the r-hub is increadibly interesting and will hopefully make my life as an R-package developer easier.

My own 2015-R-experience

My own personal R experience has been dominated by magrittr and dplyr, as seen in above code. As most I find that magrittr makes things a little easier to read and unless I have som really large dataset the overhead is small. It does have some downsides related to debugging but these are negligeable. When I originally tried dplyr out I came from the plyr environment and was disappointed by the lack of parallelization, I found the concepts a little odd when thinking the plyr way. I had been using sqldf a lot in my data munging and merging, when I found the left_join, inner_joint, and the brilliant anti_join I was completely sold. Combined with RStudio I find the dplyr-workflow both intuitive and more productive than my previous. When looking at those packages (including more than just the top 10 here) I did find some additional gems that I intend to look into when I have the time:

To leave a comment for the author, please follow the link and comment on their blog: R – G-Forge.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.