Site icon R-bloggers

Big changes behind the scenes in R 3.5.0

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A major update to R is now available. The R Core group has announced the release of R 3.5.0, and binary versions for Windows and Linux are now available from the primary CRAN mirror. (The Mac release is forthcoming.)

Probably the biggest change in R 3.5.0 will be invisible to most users — except by the performance improvements it brings. The ALTREP project has now been rolled into R to use more efficient representations of many vectors, resulting in less memory usage and faster computations in many common situations. For example, the sequence vector 1:1000000 is now represented just by its start and end value, instead of allocating a vector of a million elements as earlier versions of R would do. So while R 3.4.3 takes about 1.5 seconds to run x <- 1:1e9 on my laptop, it's instantaneous in R 3.5.0.

There have been improvements in other areas too, thanks to ALTREP. The output of the sort function has a new representation: it includes a flag indicating that the vector is already sorted, so that sorting it again is instantaneous. As a result, running x <- sort(x) is now free the second and subsequent times you run it, unlike earlier versions of R. This may seem like a contrived example, but operations like this happen all the time in the internals of R code. Another good example is converting a numeric to a character vector: as.character(x) is now also instantaneous (the coercion to character is deferred until the character representation is actually needed). This has significant impact in R's statistical modelling functions, which carry around a long character vector that usually contains just numbers — the row names — with the design matrix. As a result, the calculation:

d <- data.frame(y = rnorm(1e7), x = 1:1e7)
lm(y ~ x, data=d)

runs about 4x faster on my system. (It also uses a lot less memory: running the equivalent command with 10x more rows failed for me in R 3.4.3 but succeeded in 3.5.0.)

The ALTREP system is designed to be extensible, but in R 3.5.0 the system is used exclusively for the internal operations of R. Nonetheless, if you'd like to get a sneak peek on how you might be able to use ALTREP yourself in future versions of R, you can take a look at this vignette (with the caveat that the interface may change when it's finally released).

There are many other improvements in R 3.5.0 beyond the ALTREP system, too. You can find the full details in the announcement, but here are a few highlights:

  • All packages are now byte-compiled on installation. R's base and recommended packages, and packages on CRAN, were already byte-compiled, so this will have the effect of improving the performance of packages installed from Github and from private sources.
  • R's performance is better when many packages are loaded, and more packages can be loaded at the same time on Windows (when packages use compiled code).
  • Improved support for long vectors, by functions including object.size, approx and spline
  • Reading in text data with readLines and scan should be faster, thanks to buffering on text connections.
  • R should handle some international data files better, with several bugs related to character encodings having been resolved.

Because R 3.5.0 is a major release, you will need to re-install any R packages you use. (The installr package can help with this.) On my reading of the release notes, there haven't been any major backwardly-incompatible changes, so your old scripts should continue to work. Nonetheless, given the significant changes behind the scenes, it might be best to wait for a maintenance release before using R 3.5.0 for production applications. But for developers and data science work, I recommend jumping over to R 3.5.0 right away, as the benefits are significant. 

You can find the details of what's new in R 3.5.0 at the link below. As always, many thanks go to the R Core team and the other volunteers who have contributed to the open source R project over the years.

R-announce mailing list: R 3.5.0 is released

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.