Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
R Packages growth Curve
Why R is so popular? There are a lot of reasons, such as: easy to learn and convenient to use, active community, open source, etc. Another important reason is the numerous contributed packages. Up to yesterday, there are 4033 R packages on CRAN. How is the growth curve of R packages in the pasted decade? How many packages were contributed to CRAN every month?
The following figure shows the growth curve of R package:
File c:/tianhd.me/source/gvis/RpkgCurve1.html could not be found < !--more-->
R is getting more and more popular which can be seen from the number of packages contributed every month:
File c:/tianhd.me/source/gvis/RpkgCurve2.html could not be found
The first contributed R package is called leaps: regression subset selection. Uploaded by Thomas Lumley.
Here is the R code for above result. The code generated more information behind the above, which will be used in the next blogs.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | # Load packages needed; library(XML) library(googleVis)</p> <h1 id="set-cran-depository">set CRAN depository;</h1> <p>CRAN.mirr <- “http://cran.r-project.org/” CRAN.home <- “web/packages/available_packages_by_name.html”</p> <h1 id="read-in-packages-name-and-description">read in packages name and description;</h1> <p>pkg <- readHTMLTable(paste(CRAN.mirr, CRAN.home, sep = “”), skip = 1,)[[1]] names(pkg) <- c(“Name”, “Description”) pkg <- pkg[!is.na(pkg$Name),] pkg[,1] <- as.character(pkg[,1]) pkg[,2] <- as.character(pkg[,2])</p> <h1 id="define-a-function-to-convert-date-format-11-jun-2011-to-2011-06-11">Define a function to convert date format “11-Jun-2011” to “2011-06-11”;</h1> <p>as.posix <- function(x) { day <- substr(x, 1, 2) mth <- substr(x, 4, 6) yr <- substr(x, 8, 11) Mth <- c(“Jan”, “Feb”, “Mar”, “Apr”, “May”, “Jun”, “Jul”, “Aug”, “Sep”, “Oct”, “Nov”, “Dec”) mth <- unlist(sapply(mth, FUN = function(x) { m <- which(Mth == x) if (nchar(m) == 1) m <- paste(“0”, m, sep = “”) return(m)})) paste(yr, mth, day, sep = “-“) }</p> <h1 id="create-a-list-to-contain-detail-information-of-each-package">Create a list to contain detail information of each package;</h1> <p># This process will take about 15 minutes; PKG <- list() pb <- txtProgressBar(min = 0, max = nrow(pkg), style = 3) for (i in 1:nrow(pkg)) { pkg.nam <- pkg$Name[i] pkg.url <- paste(CRAN.mirr, “web/packages/”, pkg.nam, “/index.html”, sep = “”) pkg.des <- readHTMLTable(pkg.url) names(pkg.des) <- c(“Description”, “Downloads”, “Dependency”)[1:length(pkg.des)] if (“Old sources:” %in% pkg.des$Downloads$V1) { hist.url <- paste(CRAN.mirr, “src/contrib/Archive/”, pkg.nam, sep = “”) hist.dat <- readHTMLTable(hist.url, skip = 2)[[1]][, 2:3] names(hist.dat) <- c(“Name”, “Date”) hist.dat <- hist.dat[!is.na(hist.dat$Name),] hist.dat$Date <- as.posix(hist.dat$Date) pkg.des[[“History”]] <- hist.dat } for (l in 1:length(pkg.des)) { pkg.des[[l]][,1] <- as.character(pkg.des[[l]][,1]) pkg.des[[l]][,2] <- as.character(pkg.des[[l]][,2]) } PKG[[pkg.nam]] <- pkg.des setTxtProgressBar(pb, i) } close(pb)</p> <h1 id="extract-the-date-of-the-first-version-of-each-package">Extract the date of the first version of each package;</h1> <p>pkg.trend <- data.frame(pkg.name = names(PKG)) for (i in 1:nrow(pkg.trend)) { pkg <- pkg.trend$pkg.name[i] pkg.des <- PKG[[pkg]] if (“History” %in% names(pkg.des)) { pkg.trend$Date.1[i] <- as.character(min(pkg.des$History$Date)) }else { pkg.trend$Date.1[i] <- pkg.des$Description$V2[which(pkg.des$Description$V1 == “Published:”)] } }</p> <h1 id="aggregates-the-package-number-for-each-month">aggregates the package number for each month;</h1> <p>pkg.trend$Date.2 <- paste(substr(pkg.trend$Date.1, 1, 7), “01”, sep = “-“) pkg.trend$Date.2 <- as.POSIXct(pkg.trend$Date.2, format = “%Y-%m-%d”) pkg.dat <- with(pkg.trend, aggregate(list(Num = Date.2), list(Date = Date.2), length)) pkg.dat$Num1 <- cumsum(pkg.dat$Num)</p> <h1 id="display-growth-curve-using-googlevis">Display growth curve using GoogleVis;</h1> <p>Line1 <- gvisLineChart(pkg.dat, xvar=”Date”, yvar=”Num1”) Line2 <- gvisLineChart(pkg.dat, xvar=”Date”, yvar=”Num”) plot(Line1) plot(Line2)</p> <p> |
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.