Site icon R-bloggers

New version of pqR, with major speed improvements

[This article was first published on R Programming – Radford Neal's blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’ve released pqR-2018-11-18, a new version of my variant implementation of R.  You can install it on Linux, Windows, or Mac as described at pqR-project.org. Installation must currently be from source, similarly to source installs of R Core versions of R.

This version has some major speed improvements, as well as some new features. I’ll details some of these improvements in future posts. Here, I’ll just mention a few things to show the flavour of the improvements in this release, and why you might be interested in pqR as an alternative to the R Core implementation.

Performance improvements

One landmark reached in this release is that it is no longer advisable to use the byte-code compiler in pqR. The speed of direct interpretation of R code has now been improved to the point where it is about as fast at executing simple scalar code as the byte-code interpreter. Eliminating the byte-code compiler simplifies the overall implementation, and avoids possible semantic differences between interpreted and byte-compiled code. It is also important for pqR because some pqR optimizations and some new pqR features are not implemented in byte-code. For example, only the interpreter does optimizations such as deferring vector operations so that they may automatically be merged with other operations or be done in parallel when multiple cores are available.

Some vector operations have been substantially sped up compared to the previous release of pqR-2017-06-09. The improvement compared to R-3.5.1 can be even greater. Here is an example of replacing a subset of vector elements, benchmarked on an Intel “Skylake” processor, with both pqR-2018-11-18 and R-3.5.1 compiled from source with gcc 8.2.0 at optimization level -O3:

Here’s R-3.5.1:

> a <- numeric(20000)
> system.time(for (i in 1:100000) a[2:19999] <- 3.1)
   user  system elapsed 
  4.211   0.148   4.360 

And here’s pqR-2018-11-18:

> a <- numeric(20000)
> system.time(for (i in 1:100000) a[2:19999] <- 3.1)
   user  system elapsed 
  0.256   0.000   0.257 

So the current R Core implementation is 17 times slower than pqR for this replacement operation.
< !--- (The pre-compiled version of R-3.5.1 available from r-project.org takes 4.632 seconds elapsed time for this example. The elapsed time for the R Core development version of 2018-11-25 is 4.359 seconds.) -->

The advantage of pqR isn’t always this large, but many vector operations are sped up by smaller but still significant factors. An example:

With R-3.5.1:

> a <- seq(0,1,length=2000); b <- seq(1,0,length=2000)
> system.time (for (i in 1:100000) {
+       d <- abs(a-b); r <- sum (d>0.4 & d<0.7) })
   user  system elapsed 
  1.215   0.015   1.231 

With pqR-2018-11-18:

> a <- seq(0,1,length=2000); b <- seq(1,0,length=2000)
> system.time (for (i in 1:100000) {
+       d <- abs(a-b); r <- sum (d>0.4 & d<0.7) })
   user  system elapsed 
  0.654   0.008   0.662 

So for this example, pqR is almost twice as fast.

For some operations, pqR’s implementation has lower asymptotic time complexity, and so can be enormously faster. An example is the following convenient coding pattern that R programmers are currently warned to avoid:

With R-3.5.1:

> n <- 200000; a <- numeric(0);
> system.time (for (i in 1:n) a <- c(a,(i+1)^2))
   user  system elapsed 
 30.387   0.223  30.612 

With pqR-2018-11-18:

> n <- 200000; a <- numeric(0);
> system.time (for (i in 1:n) a <- c(a,(i+1)^2))
   user  system elapsed 
  0.040   0.004   0.045 

In R-3.5.1, extending a vector one element at a time with “c” takes time growing as n2, as a new vector is allocated when each element is appended. With the latest version of pqR, the time grows only as n log n. In this example, that leads to pqR being 680 times faster, but the ratio could be made arbitrarily large by increasing n.

It’s still faster in pqR to preallocate a vector of length n, but only by about a factor of three, which would often be tolerable when writing one-off code if using “c” is more convenient.

New features

The latest version of pqR has some new features. As for earlier pqR versions, some new features are aimed at addressing design flaws in R that lead to unreliable code, and others are aimed at making R more convenient for programming and scripting.

One new convenience feature is that the paste and paste0 operations can now be written with new !! and ! operators. For example,

> city <- "Toronto"; province <- "Ontario"
> city ! "," !! province
[1] "Toronto, Ontario"

The !! operator pastes strings together with space separation; the ! operator pastes with no separation. Of course, ! retains its meaning of “not” when used as a unary operator; there is no ambiguity.

What next?

I’ll be writing some more blog posts regarding improvements in pqR-2018-11-18, and regarding some improvements in earlier pqR versions that I haven’t already blogged about. Of course, you can read about these now in the pqR NEWS file.

The main disadvantage of pqR is that it is not fully compatible with the current R Core version. It is a fork of R-2.15.0, with many, but not all, later changes incorporated. This affects what packages will work with pqR.

Addressing this compatibility issue is one thing that needs to be done going forward. I’ll discuss this and other plans — notably implementing automatic differentiation — in another future blog post.

I’m open to other people getting involved in this project. Of course, you can contribute now by trying out pqR and reporting any problems in the comments here or at the pqR issues page. Performance comparisons, especially on real-world applications, are also welcome.

Finally, for the paranoid, here are the shasum values for the compressed and uncompressed tar files that you can download from pqR-project.org:

89216dc76be23b3928c26561acc155b6e5ad32f3  pqR-2018-11-18.tar.gz
f0ee8a37198b7e078fa1aec7dd5cda762f1a7799  pqR-2018-11-18.tar

To leave a comment for the author, please follow the link and comment on their blog: R Programming – Radford Neal's blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.