Deferred evaluation in Renjin, Riposte, and pqR
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The previously sleepy world of R implementation is waking up. Shortly after I announced pqR, my “pretty quick” implementation of R, the Renjin implementation was announced at UserR! 2013. Work also proceeds on Riposte, with release planned for a year from now. These three implementations differ greatly in some respects, but interestingly they all try to use multiple processor cores, and they all use some form of deferred evaluation.
Deferred evaluation isn’t the same as “lazy evaluation” (which is how R handles function arguments). Deferred evaluation is purely an implementation technique, invisible to the user, apart from its effect on performance. The idea is to sometimes not do an operation immediately, but instead wait, hoping that later events will allow the operation to be done faster, perhaps because a processor core becomes available for doing it in another thread, or perhaps because it turns out that it can be combined with a later operation, and both done at once.
Below, I’ll sketch how deferred evaluation is implemented and used in these three new R implementations, and also comment a bit on their other characteristics. I’ll then consider whether these implementations might be able to borrow ideas from each other to further expand the usefulness of deferred evaluaton.
My pqR implementation (discussed also in my blog post here and others it links to), is the most conservative of these projects. It is based on R-2.15.0, and so retains compatibility with the huge number of R packages that work with that version of R (apart from any bugs that I may have introduced, and excepting any packages that use internal characteristics of the R interpreter that they weren’t supposed to rely on). Many of the performance improvements in pqR result simply from detailed rewrites of C code — indeed, the project began with my realization of just how much scope there must be for such improvements. But there are other more profound changes in pqR. Two relating to deferred evaluation are the use of helper threads to perform numerical computations concurrently on multicore systems, and to a lesser extent the use of variant returns to optimize some operations.
Renjin (see also this talk) is written in Java, but tries to retain compatibility by reusing many parts of R-2.14.2 (eg, by translating C components to Java). One of its goals is to allow R to use huge objects that don’t fit in memory. Perhaps related to this, its objects are immutable, which is apparently tolerable because it can create “views” of objects that are the same except for some simple changes. These views also play a role in how Renjin can defer computations until a tree of operations has been built, which can then be optimized and perhaps done in parallel.
Riposte (see also this paper), appears to be mostly written in C++, and to currently target only Intel processors. Of the three projects, Riposte’s implementation seems to be the most distinct from the current R Core implementation, which may perhaps lead to compatibility problems. However, this may also allow it to employ more sophisticated techniques. Its use of deferred evaluation seems generally similar to Renjin’s, but I expect that many details differ. (The available documentation on Renjin isn’t sufficient to really tell.)
So, when do these implementations defer evaluations? Consider the following R code, and assume that the variables a and b are numeric vectors of the same length:
f <- function (a,b) { u <- a*b+1; exp(u); } print(f(a,b)/2)
When evaluating this code, pqR, Renjin, and Riposte may all not do the multiply, not do the add, not do the exponentiation, and not do the division, until the final value is actually needed by print
.
In pqR, this deferral will happen when helper threads are enabled (provided the vectors a and b are sufficiently long). Tasks for performing the multiply, the add, the exponentiation, and the division will be scheduled as these operations are encounted when interpreting the code. Helper threads may immediately start performing these tasks, or they may not, if all the helper threads are busy. When the master thread needs the actual data to print, it will wait until all these tasks are finished, doing them itself if necessary. However, it’s possible that no wait will be needed — while the master thread was handling interpretive tasks such as returning from the function f, the helper threads may have finished all the computations, allowing printing to start immediately.
Renjin and Riposte also do the equivalent of scheduling tasks for the multiply, add, exponentiation, and division, though for Renjin this is seen as constructing a directed graph of operations, and for Riposte it is seen as creating a “vector trace”. They both will also ensure that these computations are actually performed when the result is needed for printing.
However, Renjin and Riposte differ from pqR in that, as far as I can tell, neither will start any of the vector computations above until the call of print
, even if there are idle processor cores. Because of this, it seems that only pqR can overlap interpretive operations in the master thread with numeric computations done in other threads. The advantage of not immedately starting these computations is that when the result is finally needed, the whole tree of computations is available, allowing optimizations to be performed. In particular, operations can be combined, so that possibly long intermediate vectors do not need to be allocated. In the above example, a single loop (perhaps split amongst processor cores) can read sucessive elements, i, of a and b, compute exp(a[i]*b[i]+1)/2
, and store the result (or even just print it without ever storing it).
This merging of operations can be beneficial even if only one processor core is available. Since the general apparatus for deferred evaluation is already implemented in pqR, adding a capability for merging operations might not be too much work, though I suspect the result would be a bit of a “hack” compared to what Renjin or Riposte do. Actually, pqR already does a very limited amount of merging via its variant result mechanism. For example, the evaluation of sum(exp(x))
will not store exp(x)
in memory, but just sum its elements as they are computed.
In the other direction, I wonder whether pqR’s immediate use of available processor cores could be incorporated into Renjin or Riposte. For very long vectors, starting work immediately may not help much, but I think many R programs work on vectors of moderate size (thousands of elements), for which the time it takes for the interpreter to reach the point where the result is needed may be comparable to the time required for the vector computation. Starting the computation immediately will then give a substantial speedup.
Finally, Renjin, Riposte, and pqR aren’t the only new implementations of R in the works. There are at least two more — FastR, which uses Java, and CXXR, which uses C++. As far as I can tell, neither of these projects involve deferred evaluation, but they may have other interesting ideas.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.