Learning from alternative R engines at DSC 2014
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I was honoured to be invited earlier this month to the Directions of Statistical Computing meeting in Brixen, Italy. DSC is one of two meetings run by the R Project and unlike the useR! conference, DSC is a much smaller and intimate meeting (DSC 2014 had about 30 participants). If you haven't come across DSC meeting before (quite possible, given that it had last been held in 2009), R Core Group member Martyn Plummer has a nice overview of DSC.
A focus of the first day of the conference was on the performance of R computation engine. The organizers invited representatives from all of the “alternative” R engine implementations, and I believe it marked the first time that developers involved with pqR, Renjin, FastR, and Riposte and TERR were gathered in the same place. (The CXXR project was unfortunately not represented.) Jan Vitek [slides] presented a fascinating comparison of the various projects, based on his interviews with the developers.
It was interesting to see the commonalities in many of the approaches. Three projects, Renjin [slides], FastR [slides] and Riposte [slides] use just-in-time compilation and an optimized bytecode engine. All have achieved impressive performance gains, but have struggled with compatibility (and especially being able to run the 6000+ CRAN packages). But it's clear that their work is having an influence on R itself: Thomas Kalibera [slides] (who previously worked on the FastR project) is working with Luke Tierney and Jan Vitek to improve the performance of R's bytecode interpreter.
Other approaches are also being pursued to improve the performance of the R engine. Luke Tierney [slides] described new improvements in R 3.1 to streamline the reference counting system, and noted that several of the performance improvements implemented by Radford Neal [slides] in pqR have already been incorporated into the R engine. And Helena Kotthaus [slides] has done some very exciting work to profile the performance of the R engine which has already led to performance improvements when virtual memory is being used.
Overall, it was exciting to see collaboration and research into R as a language, and especially the attention from the computer science community to the implementation of R. As Robert Gentleman (co-creator of R and conference lead) noted, R now has a new community beyond statisticians and data scientists: computer scientists. It's exciting to see how R is incorporating learning and innovation from this new community.
For more on DSC 2014, see the reports from Martyn Plummer on Day 1 and Day 2 of the conference. The full program, with links to download the slide presentation, is at the link below.
DSC 2014: Schedule (and slide downloads)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.