[This article was first published on R – G-Forge, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
It is always fun to look back and reflect on the past year. Inspired by Christoph Safferling’s post on top packages from published in 2015, I decided to have my own go at the top R trends of 2015. Contrary to Safferling’s post I’ll try to also (1) look at packages from previous years that hit the big league, (2) what top R coders we have in the community, and then (2) round-up with my own 2015-R-experience.
Everything in this post is based on the CRANberries reports. To harvest the information I’ve borrowed shamelessly from Safferling’s post with some modifications. He used the number of downloads as proxy for package release date, while I decided to use the release date, if that wasn’t available I scraped it off the CRAN servers. The script now also retrieves package author(s) and description (see code below for details).
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 |
library(rvest) library(dplyr) # devtools::install_github("hadley/multidplyr") library(multidplyr) library(magrittr) library(lubridate) getCranberriesElmnt <- function(txt, elmnt_name){ desc <- grep(sprintf("^%s:", elmnt_name), txt) if (length(desc) == 1){ txt <- txt[desc:length(txt)] end <- grep("^[A-Za-z/@]{2,}:", txt[-1]) if (length(end) == 0) end <- length(txt) else end <- end[1] desc <- txt[1:end] %>% gsub(sprintf("^%s: (.+)", elmnt_name), "\1", .) %>% paste(collapse = " ") %>% gsub("[ ]{2,}", " ", .) %>% gsub(" , ", ", ", .) }else if (length(desc) == 0){ desc <- paste("No", tolower(elmnt_name)) }else{ stop("Could not find ", elmnt_name, " in text: n", paste(txt, collapse = "n")) } return(desc) } convertCharset <- function(txt){ if (grepl("Windows", Sys.info()["sysname"])) txt <- iconv(txt, from = "UTF-8", to = "cp1252") return(txt) } getAuthor <- function(txt, package){ author <- getCranberriesElmnt(txt, "Author") if (grepl("No author|See AUTHORS file", author)){ author <- getCranberriesElmnt(txt, "Maintainer") } if (grepl("(No m|M)aintainer|(No a|A)uthor|^See AUTHORS file", author) || is.null(author) || nchar(author) <= 2){ cran_txt <- read_html(sprintf("http://cran.r-project.org/web/packages/%s/index.html", package)) author <- cran_txt %>% html_nodes("tr") %>% html_text %>% convertCharset %>% gsub("(^[ tn]+|[ tn]+$)", "", .) %>% .[grep("^Author", .)] %>% gsub(".*n", "", .) # If not found then the package has probably been # removed from the repository if (length(author) == 1) author <- author else author <- "No author" } # Remove stuff such as: # [cre, auth] # (worked on the...) # <my@email.com> # "John Doe" author %<>% gsub("^Author: (.+)", "\1", .) %>% gsub("[ ]*\[[^]]{3,}\][ ]*", " ", .) %>% gsub("\([^)]+\)", " ", .) %>% gsub("([ ]*<[^>]+>)", " ", .) %>% gsub("[ ]*\[[^]]{3,}\][ ]*", " ", .) %>% gsub("[ ]{2,}", " ", .) %>% gsub("(^[ '"]+|[ '"]+$)", "", .) %>% gsub(" , ", ", ", .) return(author) } getDate <- function(txt, package){ date <- grep("^Date/Publication", txt) if (length(date) == 1){ date <- txt[date] %>% gsub("Date/Publication: ([0-9]{4,4}-[0-9]{2,2}-[0-9]{2,2}).*", "\1", .) }else{ cran_txt <- read_html(sprintf("http://cran.r-project.org/web/packages/%s/index.html", package)) date <- cran_txt %>% html_nodes("tr") %>% html_text %>% convertCharset %>% gsub("(^[ tn]+|[ tn]+$)", "", .) %>% .[grep("^Published", .)] %>% gsub(".*n", "", .) # The main page doesn't contain the original date if # new packages have been submitted, we therefore need # to check first entry in the archives if(cran_txt %>% html_nodes("tr") %>% html_text %>% gsub("(^[ tn]+|[ tn]+$)", "", .) %>% grepl("^Old.{1,4}sources", .) %>% any){ archive_txt <- read_html(sprintf("http://cran.r-project.org/src/contrib/Archive/%s/", package)) pkg_date <- archive_txt %>% html_nodes("tr") %>% lapply(function(x) { nodes <- html_nodes(x, "td") if (length(nodes) == 5){ return(nodes[3] %>% html_text %>% as.Date(format = "%d-%b-%Y")) } }) %>% .[sapply(., length) > 0] %>% .[!sapply(., is.na)] %>% head(1) if (length(pkg_date) == 1) date <- pkg_date[[1]] } } date <- tryCatch({ as.Date(date) }, error = function(e){ "Date missing" }) return(date) } getNewPkgStats <- function(published_in){ # The parallel is only for making cranlogs requests # we can therefore have more cores than actual cores # as this isn't processor intensive while there is # considerable wait for each http-request cl <- create_cluster(parallel::detectCores() * 4) parallel::clusterEvalQ(cl, { library(cranlogs) }) set_default_cluster(cl) on.exit(stop_cluster()) berries <- read_html(paste0("http://dirk.eddelbuettel.com/cranberries/", published_in, "/")) pkgs <- # Select the divs of the package class html_nodes(berries, ".package") %>% # Extract the text html_text %>% # Split the lines strsplit("[n]+") %>% # Now clean the lines lapply(., function(pkg_txt) { pkg_txt[sapply(pkg_txt, function(x) { nchar(gsub("^[ t]+", "", x)) > 0}, USE.NAMES = FALSE)] %>% gsub("^[ t]+", "", .) }) # Now we select the new packages new_packages <- pkgs %>% # The first line is key as it contains the text "New package" sapply(., function(x) x[1], USE.NAMES = FALSE) %>% grep("^New package", .) %>% pkgs[.] %>% # Now we extract the package name and the date that it was published # and merge everything into one table lapply(function(txt){ txt <- convertCharset(txt) ret <- data.frame( name = gsub("^New package ([^ ]+) with initial .*", "\1", txt[1]), stringsAsFactors = FALSE ) ret$desc <- getCranberriesElmnt(txt, "Description") ret$author <- getAuthor(txt, ret$name) ret$date <- getDate(txt, ret$name) return(ret) }) %>% rbind_all %>% # Get the download data in parallel partition(name) %>% do({ down <- cran_downloads(.$name[1], from = max(as.Date("2015-01-01"), .$date[1]), to = "2015-12-31")$count cbind(.[1,], data.frame(sum = sum(down), avg = mean(down)) ) }) %>% collect %>% ungroup %>% arrange(desc(avg)) return(new_packages) } pkg_list <- lapply(2010:2015, getNewPkgStats) pkgs <- rbind_all(pkg_list) %>% mutate(time = as.numeric(as.Date("2016-01-01") - date), year = format(date, "%Y")) |
Downloads and time on CRAN
The longer a package has been on CRAN the more downloaded it gets. We can illustrate this using simple linear regression, slightly surprising is that this behaves mostly linear:
1 2 3 4 5 6 7 8 |
pkgs %<>% mutate(time_yrs = time/365.25) fit <- lm(avg ~ time_yrs, data = pkgs) # Test for non-linearity library(splines) anova(fit, update(fit, .~.-time_yrs+ns(time_yrs, 2))) |
Analysis of Variance Table Model 1: avg ~ time Model 2: avg ~ ns(time, 2) Res.Df RSS Df Sum of Sq F Pr(>F) 1 7348 189661922 2 7347 189656567 1 5355.1 0.2075 0.6488Where the number of average downloads increases with about 5 downloads per year. It can easily be argued that the average number of downloads isn’t that interesting since the data is skewed, we can therefore also look at the upper quantiles using quantile regression:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
library(quantreg) library(htmlTable) lapply(c(.5, .75, .95, .99), function(tau){ rq_fit <- rq(avg ~ time_yrs, data = pkgs, tau = tau) rq_sum <- summary(rq_fit) c(Estimate = txtRound(rq_sum$coefficients[2, 1], 1), `95 % CI` = txtRound(rq_sum$coefficients[2, 1] + c(1,-1) * rq_sum$coefficients[2, 2], 1) %>% paste(collapse = " to ")) }) %>% do.call(rbind, .) %>% htmlTable(rnames = c("Median", "Upper quartile", "Top 5%", "Top 1%")) |
Estimate | 95 % CI | |
---|---|---|
Median | 0.6 | 0.6 to 0.6 |
Upper quartile | 1.2 | 1.2 to 1.1 |
Top 5% | 9.7 | 11.9 to 7.6 |
Top 1% | 182.5 | 228.2 to 136.9 |
Top downloaded packages
In order to investigate what packages R users have been using during 2015 I’ve looked at all new packages since the turn of the decade. Since each year of CRAN-presence increases the download rates, I’ve split the table by the package release dates. The results are available for browsing below (yes – it is the new brand interactive htmlTable that allows you to collapse cells – note it may not work if you are reading this on R-bloggers and the link is lost under certain circumstances).Downloads | ||||||
---|---|---|---|---|---|---|
Name | Author | Total | Average/day | Description | ||
Top 10 packages published in 2015 | ||||||
xml2 | Hadley Wickham, Jeroen Ooms, RStudio, R Foundation | 348,222 | 1635 | Work with XML files … Work with XML files using a simple, consistent interface. Built on top of the ‘libxml2’ C library. | ||
rversions | Gabor Csardi | 386,996 | 1524 | Query the main R SVN… Query the main R SVN repository to find the versions r-release and r-oldrel refer to, and also all previous R versions and their release dates. | ||
git2r | Stefan Widgren | 411,709 | 1303 | Interface to the lib… Interface to the libgit2 library, which is a pure C implementation of the Git core methods. Provides access to Git repositories to extract data and running some basic git commands. | ||
praise | Gabor Csardi, Sindre Sorhus | 96,187 | 673 | Build friendly R pac… Build friendly R packages that praise their users if they have done something good, or they just need it to feel better. | ||
readxl | David Hoerl | 99,386 | 379 | Import excel files i… Import excel files into R. Supports ‘.xls’ via the embedded ‘libxls’ C library (http://sourceforge.net/projects/libxls/) and ‘.xlsx’ via the embedded ‘RapidXML’ C++ library (http://rapidxml.sourceforge.net). Works on Windows, Mac and Linux without external dependencies. | ||
readr | Hadley Wickham, Romain Francois, R Core Team, RStudio | 90,022 | 337 | Read flat/tabular te… Read flat/tabular text files from disk. | ||
DiagrammeR | Richard Iannone | 84,259 | 236 | Create diagrams and … Create diagrams and flowcharts using R. | ||
visNetwork | Almende B.V. (vis.js library in htmlwidgets/lib, | 41,185 | 233 | Provides an R interf… Provides an R interface to the ‘vis.js’ JavaScript charting library. It allows an interactive visualization of networks. | ||
plotly | Carson Sievert, Chris Parmer, Toby Hocking, Scott Chamberlain, Karthik Ram, Marianne Corvellec, Pedro Despouy | 9,745 | 217 | Easily translate ggp… Easily translate ggplot2 graphs to an interactive web-based version and/or create custom web-based visualizations directly from R. Once uploaded to a plotly account, plotly graphs (and the data behind them) can be viewed and modified in a web browser. | ||
DT | Yihui Xie, Joe Cheng, jQuery contributors, SpryMedia Limited, Brian Reavis, Leon Gersen, Bartek Szopka, RStudio Inc | 24,806 | 120 | Data objects in R ca… Data objects in R can be rendered as HTML tables using the JavaScript library ‘DataTables’ (typically via R Markdown or Shiny). The ‘DataTables’ library has been included in this R package. The package name ‘DT’ is an abbreviation of ‘DataTables’. | ||
Top 10 packages published in 2014 | ||||||
stringi | Marek Gagolewski and Bartek Tartanus ; IBM and other contributors ; Unicode, Inc. | 1,316,900 | 3608 | stringi allows for v… stringi allows for very fast, correct, consistent, and convenient character string/text processing in each locale and any native encoding. Thanks to the use of the ICU library, the package provides R users with a platform-independent functionality known to Java, Perl, Python, PHP and Ruby programmers. | ||
magrittr | Stefan Milton Bache and Hadley Wickham | 1,245,662 | 3413 | Provides a mechanism… Provides a mechanism for chaining commands with a new forward-pipe operator. Ceci n’est pas un pipe. | ||
mime | Yihui Xie | 1,038,591 | 2845 | This package guesses… This package guesses the MIME type from a filename extension using the data derived from /etc/mime.types in UNIX-type systems. | ||
R6 | Winston Chang | 920,147 | 2521 | The R6 package allow… The R6 package allows the creation of classes with reference semantics, similar to R’s built-in reference classes. Compared to reference classes, R6 classes are simpler and lighter-weight, and they are not built on S4 classes so they do not require the methods package. These classes allow public and private members, and they support inheritance. | ||
dplyr | Hadley Wickham, Romain Francois | 778,311 | 2132 | A fast, consistent t… A fast, consistent tool for working with data frame like objects, both in memory and out of memory. | ||
manipulate | JJ Allaire, RStudio | 626,191 | 1716 | Interactive plotting… Interactive plotting functions for use within RStudio. The manipulate function accepts a plotting expression and a set of controls (e.g. slider, picker, checkbox, or button) which are used to dynamically change values within the expression. When a value is changed using its corresponding control the expression is automatically re-executed and the plot is redrawn. | ||
htmltools | RStudio, Inc. | 619,171 | 1696 | Tools for HTML gener… Tools for HTML generation and output | ||
curl | Jeroen Ooms | 599,704 | 1643 | The curl() function … The curl() function provides a drop-in replacement for base url() with better performance and support for http 2.0, ssl (https, ftps), gzip, deflate and other libcurl goodies. This interface is implemented using the RConnection API in order to support incremental processing of both binary and text streams. If you are looking for a more user friendly http client, try the RCurl or httr packages instead. | ||
lazyeval | Hadley Wickham, RStudio | 572,546 | 1569 | A disciplined approa… A disciplined approach to non-standard evaluation. | ||
rstudioapi | RStudio | 515,665 | 1413 | This package provide… This package provides functions to make it easy to access the RStudio API when available, and provide informative error messages when not. | ||
Top 10 packages published in 2013 | ||||||
jsonlite | Jeroen Ooms, Duncan Temple Lang | 906,421 | 2483 | This package is a fo… This package is a fork of the RJSONIO package by Duncan Temple Lang. It builds on the parser from RJSONIO, but implements a different mapping between R objects and JSON strings. The C code in this package is mostly from Temple Lang, the R code has been rewritten from scratch. In addition to drop-in replacements for fromJSON and toJSON, the package has functions to serialize objects. Furthermore, the package contains a lot of unit tests to make sure that all edge cases are encoded and decoded consistently for use with dynamic data in systems and applications. | ||
BH | John W. Emerson, Michael J. Kane, Dirk Eddelbuettel, JJ Allaire, and Romain Francois | 691,280 | 1894 | Boost provides free … Boost provides free peer-reviewed portable C++ source libraries. A large part of Boost is provided as C++ template code which is resolved entirely at compile-time without linking. This package aims to provide the most useful subset of Boost libraries for template use among CRAN package. By placing these libraries in this package, we offer a more efficient distribution system for CRAN as replication of this code in the sources of other packages is avoided. | ||
highr | Yihui Xie and Yixuan Qiu | 641,052 | 1756 | This package provide… This package provides syntax highlighting for R source code. Currently it supports LaTeX and HTML output. Source code of other languages can be supported via Andre Simon’s Highlight package. | ||
assertthat | Hadley Wickham | 527,961 | 1446 | assertthat is an ext… assertthat is an extension to stopifnot() that makes it easy to declare the pre and post conditions that you code should satisfy, while also producing friendly error messages so that your users know what they’ve done wrong. | ||
httpuv | RStudio, Inc. | 310,699 | 851 | httpuv provides low-… httpuv provides low-level socket and protocol support for handling HTTP and WebSocket requests directly from within R. It is primarily intended as a building block for other packages, rather than making it particularly easy to create complete web applications using httpuv alone. httpuv is built on top of the libuv and http-parser C libraries, both of which were developed by Joyent, Inc. (See LICENSE file for libuv and http-parser license information.) | ||
NLP | Kurt Hornik | 270,682 | 742 | Basic classes and me… Basic classes and methods for Natural Language Processing. | ||
TH.data | Torsten Hothorn | 242,060 | 663 | Contains data sets u… Contains data sets used on other packages I maintain. | ||
NMF | Renaud Gaujoux, Cathal Seoighe | 228,807 | 627 | This package provide… This package provides a framework to perform Non-negative Matrix Factorization (NMF). It implements a set of already published algorithms and seeding methods, and provides a framework to test, develop and plug new/custom algorithms. Most of the built-in algorithms have been optimized in C++, and the main interface function provides an easy way of performing parallel computations on multicore machines. | ||
stringdist | Mark van der Loo | 123,138 | 337 | Implements the Hammi… Implements the Hamming distance and weighted versions of the Levenshtein, restricted Damerau-Levenshtein (optimal string alignment), and Damerau-Levenshtein distance. | ||
SnowballC | Milan Bouchet-Valat | 104,411 | 286 | An R interface to th… An R interface to the C libstemmer library that implements Porter’s word stemming algorithm for collapsing words to a common root to aid comparison of vocabulary. Currently supported languages are Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish and Turkish. | ||
Top 10 packages published in 2012 | ||||||
gtable | Hadley Wickham | 1,091,440 | 2990 | Tools to make it eas… Tools to make it easier to work with “tables” of grobs. | ||
knitr | Yihui Xie | 792,876 | 2172 | This package provide… This package provides a general-purpose tool for dynamic report generation in R, which can be used to deal with any type of (plain text) files, including Sweave and HTML. The patterns of code chunks and inline R expressions can be customized. R code is evaluated as if it were copied and pasted in an R terminal thanks to the evaluate package (e.g. we do not need to explicitly print() plots from ggplot2 or lattice). R code can be reformatted by the formatR package so that long lines are automatically wrapped, with indent and spaces being added, and comments being preserved. A simple caching mechanism is provided to cache results from computations for the first time and the computations will be skipped the next time. Almost all common graphics devices, including those in base R and add-on packages like Cairo, cairoDevice and tikzDevice, are built-in with this package and it is straightforward to switch between devices without writing any special functions. The width and height as well as alignment of plots in the output document can be specified in chunk options (the size of plots for graphics devices is still supported as usual). Multiple plots can be recorded in a single code chunk, and it is also allowed to rearrange plots to the end of a chunk or just keep the last plot. Warnings, messages and errors are written in the output document by default (can be turned off). Currently LaTeX, HTML and Markdown are supported, and other output formats can be supported by hook functions. The large collection of hooks in this package makes it possible for the user to control almost everything in the R code input and output. Hooks can be used either to format the output or to run a specified R code fragment before or after a code chunk. Most features are borrowed or inspired by Sweave, cacheSweave, pgfSweave, brew and decumar. | ||
httr | Hadley Wickham | 785,568 | 2152 | Provides useful tool… Provides useful tools for working with HTTP connections. Is a simplified wrapper built on top of RCurl. It is much much less configurable but because it only attempts to encompass the most common operations it is also much much simpler. | ||
markdown | JJ Allaire, Jeffrey Horner, Vicent Marti, and Natacha Porte | 636,888 | 1745 | Markdown is a plain-… Markdown is a plain-text formatting syntax that can be converted to XHTML or other formats. This package provides R bindings to the Sundown markdown rendering library. | ||
Matrix | Douglas Bates and Martin Maechler | 470,468 | 1289 | Classes and methods … Classes and methods for dense and sparse matrices and operations on them using Lapack and SuiteSparse. | ||
shiny | RStudio, Inc. | 427,995 | 1173 | Shiny makes it incre… Shiny makes it incredibly easy to build interactive web applications with R. Automatic “reactive” binding between inputs and outputs and extensive pre-built widgets make it possible to build beautiful, responsive, and powerful applications with minimal effort. | ||
lattice | Deepayan Sarkar | 414,716 | 1136 | Lattice is a powerfu… Lattice is a powerful and elegant high-level data visualization system, with an emphasis on multivariate data, that is sufficient for typical graphics needs, and is also flexible enough to handle most nonstandard requirements. See ?Lattice for an introduction. | ||
pkgmaker | Renaud Gaujoux | 225,796 | 619 | This package provide… This package provides some low-level utilities to use for package development. It currently provides managers for multiple package specific options and registries, vignette, unit test and bibtex related utilities. It serves as a base package for packages like NMF, RcppOctave, doRNG, and as an incubator package for other general purposes utilities, that will eventually be packaged separately. It is still under heavy development and changes in the interface(s) are more than likely to happen. | ||
rngtools | Renaud Gaujoux | 225,125 | 617 | This package contain… This package contains a set of functions for working with Random Number Generators (RNGs). In particular, it defines a generic S4 framework for getting/setting the current RNG, or RNG data that are embedded into objects for reproducibility. Notably, convenient default methods greatly facilitate the way current RNG settings can be changed. | ||
base64enc | Simon Urbanek | 223,120 | 611 | This package provide… This package provides tools for handling base64 encoding. It is more flexible than the orphaned based64 package. | ||
Top 10 packages published in 2011 | ||||||
scales | Hadley Wickham | 1,305,000 | 3575 | Scales map data to a… Scales map data to aesthetics, and provide methods for automatically determining breaks and labels for axes and legends. | ||
devtools | Hadley Wickham | 738,724 | 2024 | Collection of packag… Collection of package development tools | ||
RcppEigen | Douglas Bates, Romain Francois and Dirk Eddelbuettel | 634,224 | 1738 | R and Eigen integrat… R and Eigen integration using Rcpp. Eigen is a C++ linear template library for linear algebra: matrices, vectors, numerical solvers and related algorithms. It supports dense and sparse matrices on integer, floating point and complex numbers. The performance on many algorithms is comparable with some of the best implementations based on Lapack and level-3 BLAS. The RcppEigen package includes the header files from the Eigen C++ template library (currently version 3.0.1). Thus users do not need to install Eigen itself in order to use RcppEigen. Eigen is licensed under the GNU LGPL version 3 or later, and also under the GNU GPL version 2 or later. RcppEigen (the Rcpp bindings/bridge to Eigen) is licensed under the GNU GPL version 2 or later, as is the rest of Rcpp. | ||
fpp | Rob J Hyndman | 583,505 | 1599 | All data sets requir… All data sets required for the workshop in Kandersteg, Switzerland, 20-22 June 2011. | ||
nloptr | Jelmer Ypma | 583,230 | 1598 | nloptr is an R inter… nloptr is an R interface to NLopt. NLopt is a free/open-source library for nonlinear optimization, providing a common interface for a number of different free optimization routines available online as well as original implementations of various other algorithms. See | ||
pbkrtest | Ulrich Halekoh Søren Højsgaard | 536,409 | 1470 | Test in linear mixed… Test in linear mixed effects models based on parametric bootstrap approaches and Kenward-Roger modification of F-tests | ||
roxygen2 | Hadley Wickham, Peter Danenberg, Manuel Eugster | 478,765 | 1312 | A Doxygen-like in-so… A Doxygen-like in-source documentation system for Rd, collation, and NAMESPACE. | ||
whisker | Edwin de Jonge | 413,068 | 1132 | logicless templating… logicless templating, reuse templates in many programming languages including R | ||
doParallel | Revolution Analytics | 299,717 | 821 | Provides a parallel … Provides a parallel backend for the %dopar% function using the parallel package. | ||
abind | Tony Plate and Richard Heiberger | 255,151 | 699 | Combine multi-dimens… Combine multi-dimensional arrays into a single array. This is a generalization of cbind and rbind. Works with vectors, matrices, and higher-dimensional arrays. Also provides functions adrop, asub, and afill for manipulating, extracting and replacing data in arrays. | ||
Top 10 packages published in 2010 | ||||||
reshape2 | Hadley Wickham | 1,395,099 | 3822 | Reshape lets you fle… Reshape lets you flexibly restructure and aggregate data using just two functions: melt and cast. | ||
labeling | Justin Talbot | 1,104,986 | 3027 | Provides a range of … Provides a range of axis labeling algorithms | ||
evaluate | Hadley Wickham | 862,082 | 2362 | Parsing and evaluati… Parsing and evaluation tools that make it easy to recreate the command line behaviour of R. | ||
formatR | Yihui Xie | 640,386 | 1754 | This package provide… This package provides a GUI (using gWidgets) to format R source code. Spaces and indent will be added to the code automatically, so that R code will be more readable and tidy. | ||
minqa | Katharine M. Mullen, John C. Nash, Ravi Varadhan | 600,527 | 1645 | Derivative-free opti… Derivative-free optimization by quadratic approximation based on an interface to Fortran implementations by M. J. D. Powell | ||
gridExtra | Baptiste Auguie | 581,140 | 1592 | misc. functions | ||
memoise | Hadley Wickham | 552,383 | 1513 | Cache the results of… Cache the results of a function so that when you call it again with the same arguments it returns the pre-computed value. | ||
RJSONIO | Duncan Temple Lang | 414,373 | 1135 | This is a package th… This is a package that allows conversion to and from data in Javascript object notation (JSON) format. This allows R objects to be inserted into Javascript/ECMAScript/ActionScript code and allows R programmers to read and convert JSON content to R objects. This is an alternative to rjson package. That version is too slow for large data and not extensible, but a very useful prototype. This package uses methods, vectorized operations and C code and callbacks to R functions for deserializing JSON objects to R. In the future, we will implement the deserialization in C. There are some routines that can be used now for particular array types. | ||
RcppArmadillo | Romain Francois and Dirk Eddelbuettel | 410,368 | 1124 | R and Armadillo inte… R and Armadillo integration using Rcpp Armadillo is a C++ linear algebra library aiming towards a good balance between speed and ease of use. Integer, floating point and complex numbers are supported, as well as a subset of trigonometric and statistics functions. Various matrix decompositions are provided through optional integration with LAPACK and ATLAS libraries. A delayed evaluation approach is employed (during compile time) to combine several operations into one and reduce (or eliminate) the need for temporaries. This is accomplished through recursive templates and template meta-programming. This library is useful if C++ has been decided as the language of choice (due to speed and/or integration capabilities), rather than another language. This Armadillo / C integration provides a nice illustration of the capabilities of the Rcpp package for seamless R and C++ integration/ | ||
xlsx | Adrian A. Dragulescu | 401,991 | 1101 | Provide R functions … Provide R functions to read/write/format Excel 2007 (xlsx) file formats. |
R-star authors
Just for fun I decided to look at who has the most downloads. By splitting multi-authors into several and also splitting their downloads we can find that in 2015 the top R-coders where:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
top_coders <- list( "2015" = pkgs %>% filter(format(date, "%Y") == 2015) %>% partition(author) %>% do({ authors <- strsplit(.$author, "[ ]*([,;]| and )[ ]*")[[1]] authors <- authors[!grepl("^[ ]*(Inc|PhD|Dr|Lab).*[ ]*$", authors)] if (length(authors) >= 1){ # If multiple authors the statistic is split among # them but with an added 20% for the extra collaboration # effort that a multi-author envorionment calls for .$sum <- round(.$sum/length(authors)*1.2) .$avg <- .$avg/length(authors)*1.2 ret <- . ret$author <- authors[1] for (m in authors[-1]){ tmp <- . tmp$author <- m ret <- rbind(ret, tmp) } return(ret) }else{ return(.) } }) %>% collect() %>% group_by(author) %>% summarise(download_ave = round(sum(avg)), no_packages = n(), packages = paste(name, collapse = ", ")) %>% select(author, download_ave, no_packages, packages) %>% collect() %>% arrange(desc(download_ave)) %>% head(10), "all" = pkgs %>% partition(author) %>% do({ if (grepl("Jeroen Ooms", .$author)) browser() authors <- strsplit(.$author, "[ ]*([,;]| and )[ ]*")[[1]] authors <- authors[!grepl("^[ ]*(Inc|PhD|Dr|Lab).*[ ]*$", authors)] if (length(authors) >= 1){ # If multiple authors the statistic is split among # them but with an added 20% for the extra collaboration # effort that a multi-author envorionment calls for .$sum <- round(.$sum/length(authors)*1.2) .$avg <- .$avg/length(authors)*1.2 ret <- . ret$author <- authors[1] for (m in authors[-1]){ tmp <- . tmp$author <- m ret <- rbind(ret, tmp) } return(ret) }else{ return(.) } }) %>% collect() %>% group_by(author) %>% summarise(download_ave = round(sum(avg)), no_packages = n(), packages = paste(name, collapse = ", ")) %>% select(author, download_ave, no_packages, packages) %>% collect() %>% arrange(desc(download_ave)) %>% head(30)) interactiveTable( do.call(rbind, top_coders) %>% mutate(download_ave = txtInt(download_ave)), align = "lrr", header = c("Coder", "Total ave. downloads per day", "No. of packages", "Packages"), tspanner = c("Top coders 2015", "Top coders 2010-2015"), n.tspanner = sapply(top_coders, nrow), minimized.columns = 4, rnames = FALSE, col.rgroup = c("white", "#F0F0FF")) |
Coder | Total ave. downloads | No. of packages | Packages |
---|---|---|---|
Top coders 2015 | |||
Gabor Csardi | 2,312 | 11 | sankey, franc, rvers… sankey, franc, rversions, whoami, cranlogs, progress, simplegraph, pkgconfig, keypress, praise, clisymbols |
Stefan Widgren | 1,563 | 1 | git2r |
RStudio | 781 | 16 | shinydashboard, with… shinydashboard, withr, shinybootstrap2, d3heatmap, readr, haven, xml2, purrr, ggplot2movies, bigrquery, leaflet, qrage, analogsea, svglite, gdtools, shinythemes |
Hadley Wickham | 695 | 12 | withr, cellranger, c… withr, cellranger, cowplot, readr, xml2, purrr, ggplot2movies, bigrquery, leaflet, analogsea, svglite, gdtools |
Jeroen Ooms | 541 | 10 | rjade, js, sodium, w… rjade, js, sodium, webp, brotli, bcrypt, x.ent, xml2, minimist, gdtools |
Richard Cotton | 501 | 22 | assertive.base, asse… assertive.base, assertive.types, assertive.properties, assertive.strings, assertive.numbers, assertive.reflection, assertive.files, assertive.data, assertive.data.uk, assertive.matrices, assertive.sets, assertive.models, assertive.data.us, assertive.datetimes, assertive.code, rebus.datetimes, rebus.base, rebus.numbers, rebus.unicode, rebus, runittotestthat, pathological |
R Foundation | 490 | 1 | xml2 |
David Hoerl | 455 | 1 | readxl |
Sindre Sorhus | 409 | 2 | praise, clisymbols |
Richard Iannone | 294 | 2 | DiagrammeR, stationa… DiagrammeR, stationaRy |
Top coders 2010-2015 | |||
Hadley Wickham | 32,115 | 55 | swirl, lazyeval, ggp… swirl, lazyeval, ggplot2movies, bigrquery, gdtools, Rd2roxygen, DescribeDisplay, DescribeDisplay, svglite, cowplot, cellranger, leaflet, pryr, dplyr, roxygen2, readr, plumbr, ggsubplot, ggsubplot, withr, analogsea, tourr, purrr, rmarkdown, broom, lubridate, magrittr, itertools, reshape2, evaluate, memoise, sinartra, scales, devtools, nullabor, hof, productplots, gtable, httr, nullabor, assertthat, hflights, tidyr, rvest, nycflights13, babynames, clusterfly, nasaweather, fueleconomy, ggmap, namespace, rappdirs, HistData, helpr, xml2 |
Yihui Xie | 9,739 | 18 | DT, Rd2roxygen, high… DT, Rd2roxygen, highr, tikzDevice, leaflet, rhandsontable, htmlwidgets, fun, rmarkdown, R2SWF, R2SWF, formatR, iBUGS, MSG, knitr, testit, servr, mime |
RStudio | 9,123 | 25 | shinydashboard, lazy… shinydashboard, lazyeval, ggplot2movies, bigrquery, gdtools, svglite, manipulate, rstudioapi, haven, leaflet, readr, scrypt, withr, shiny, httpuv, htmltools, ggvis, analogsea, purrr, d3heatmap, shinythemes, shinybootstrap2, rappdirs, qrage, xml2 |
Jeroen Ooms | 4,221 | 25 | JJcorr, gdtools, bro… JJcorr, gdtools, brotli, interactivity, x.ent, Mobilize, Ohmage, opencpu.encode, RAppArmor, opencpu, curl, V8, openssl, Mobilize, Ohmage, RPublica, webutils, js, sodium, webp, bcrypt, minimist, jsonlite, xml2, rjade |
Justin Talbot | 3,633 | 1 | labeling |
Winston Chang | 3,531 | 17 | shinydashboard, … shinydashboard, cm, cm, Rttf2pt1, withr, extra, extra, analogsea, shinythemes, downloader, extradb, gcookbook, bisectr, R6, namespace, shinybootstrap2, Rttf2pt1 |
Gabor Csardi | 3,437 | 26 | praise, clisymbols, … praise, clisymbols, franc, sankey, parsedate, sand, rappdirs, igraphdata, igraph0, isa2, igraph0, crayon, prettyunits, spark, falsy, disposables, dotenv, pingr, spareserver, rversions, whoami, cranlogs, progress, simplegraph, pkgconfig, keypress |
Romain Francois | 2,934 | 20 | int64, LSD, RcppExam… int64, LSD, RcppExamples, RcppClassicExamples, dplyr, readr, mlxR, RcppBDT, RcppClassic, highlight, parser, highlight, RcppEigen, RcppArmadillo, RProtoBuf, RcppGSL, base64, Rcpp11, dendextendRcpp, RcppParallel |
Duncan Temple Lang | 2,854 | 6 | RMendeley, jsonlite,… RMendeley, jsonlite, RJSONIO, Rstem, XMLSchema, SSOAP |
Adrian A. Dragulescu | 2,456 | 2 | xlsx, xlsxjars |
JJ Allaire | 2,453 | 7 | manipulate, htmlwidg… manipulate, htmlwidgets, packrat, rmarkdown, markdown, BH, RcppParallel |
Simon Urbanek | 2,369 | 15 | png, fastmatch, jpeg… png, fastmatch, jpeg, FastRWeb, OpenCL, base64enc, PKI, tiff, RSclient, RCassandra, fasttime, uuid, iotools, RSQLServer, emdist |
Dirk Eddelbuettel | 2,094 | 33 | Rblpapi, RcppSMC, RA… Rblpapi, RcppSMC, RApiSerialize, RcppExamples, RcppClassicExamples, RcppBDT, RcppClassic, gcbd, RVowpalWabbit, RcppCNPy, RcppZiggurat, RcppXts, pkgKitten, RcppRedis, RPushbullet, rfoaas, RcppAnnoy, sanitizers, drat, RcppStreams, littler, RcppCCTZ, RcppAPT, RcppTOML, gtrendsR, RcppEigen, BH, RcppArmadillo, RProtoBuf, RcppGSL, FinancialInstrument, dendextendRcpp, lbfgs |
Stefan Milton Bache | 2,069 | 3 | import, blatr, magri… import, blatr, magrittr |
Douglas Bates | 1,966 | 5 | PKPDmodels, RcppEige… PKPDmodels, RcppEigen, MatrixModels, Matrix, pedigreemm |
Renaud Gaujoux | 1,962 | 6 | NMF, doRNG, pkgmaker… NMF, doRNG, pkgmaker, rngtools, RcppOctave, doRNG |
Jelmer Ypma | 1,933 | 2 | nloptr, SparseGrid |
Rob J Hyndman | 1,933 | 3 | hts, fpp, demography |
Baptiste Auguie | 1,924 | 2 | gridExtra, dielectri… gridExtra, dielectric |
Ulrich Halekoh Søren Højsgaard | 1,764 | 1 | pbkrtest |
Martin Maechler | 1,682 | 11 | DescTools, stabledis… DescTools, stabledist, DEoptimR, expm, Bessel, MatrixModels, Matrix, nacopula, simsalapar, simsalapar, GLDEX |
Mirai Solutions GmbH | 1,603 | 3 | XLConnect, XLConnect… XLConnect, XLConnectJars, XLConnectJars |
Stefan Widgren | 1,563 | 1 | git2r |
Edwin de Jonge | 1,513 | 10 | tabplot, tabplotGTK,… tabplot, tabplotGTK, whisker, ffbase, editrules, docopt, chunked, tabplotd3, validate, deducorrect |
Kurt Hornik | 1,476 | 12 | movMF, ROI, qrmtools… movMF, ROI, qrmtools, RKEAjars, RWekajars, Unicode, NLP, openNLPdata, Rpoppler, NLPutils, W3CMarkupValidator, cclust |
Deepayan Sarkar | 1,369 | 4 | qtbase, qtpaint, lat… qtbase, qtpaint, lattice, qtutils |
Tyler Rinker | 1,203 | 9 | cowsay, wakefield, q… cowsay, wakefield, qdapTools, pacman, qdapRegex, qdap, qdapDictionaries, reports, regexr |
Yixuan Qiu | 1,131 | 12 | gdtools, svglite, hi… gdtools, svglite, highr, fun, rARPACK, showtext, R2SWF, R2SWF, rationalfun, recosystem, syss, showtextdb |
Revolution Analytics | 1,011 | 4 | doParallel, doSMP, r… doParallel, doSMP, revoIPC, checkpoint |
Torsten Hothorn | 948 | 7 | MVA, HSAUR3, TH.data… MVA, HSAUR3, TH.data, partykit, hgam, MUCflights, stabs |
My own 2015-R-experience
My own personal R experience has been dominated by magrittr and dplyr, as seen in above code. As most I find that magrittr makes things a little easier to read and unless I have som really large dataset the overhead is small. It does have some downsides related to debugging but these are negligeable. When I originally tried dplyr out I came from the plyr environment and was disappointed by the lack of parallelization, I found the concepts a little odd when thinking the plyr way. I had been using sqldf a lot in my data munging and merging, when I found the left_join, inner_joint, and the brilliant anti_join I was completely sold. Combined with RStudio I find the dplyr-workflow both intuitive and more productive than my previous. When looking at those packages (including more than just the top 10 here) I did find some additional gems that I intend to look into when I have the time:- DiagrammeR An interesting new way of producing diagrams. I’ve used it for gantt charts but it allows for much more.
- checkmate A neat package for checking function arguments.
- covr An excellent package for testing how much of a package’s code is tested.
- rex A package for making regular easier.
- openxlsx I wish I didn’t have to but I still get a lot of things in Excel-format – perhaps this package solves the Excel-import inferno…
- R6 The successor to reference classes – after working with the Gmisc::Transition-class I appreciate the need for a better system.
To leave a comment for the author, please follow the link and comment on their blog: R – G-Forge.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.