Enhancing R for Distributed Computing
Posted on February 10, 2015 by Tal Galili in R guest posts | 0 Comments
- For those who prefer high-performance computing and are willing to write at a low-level interface, MPI and R wrappers around MPI are a very good option.
- For in-memory processing, adding some form of distributed objects in R can potentially improve performance.
- Using simple parallelism constructs, such as lapply, that operate on distributed data structures may make it easier to program in R.
- Any high level API should support multiple backends, each of which can be optimized for a specific platform, much like R’s snow and foreach package run on any available backend.
Related
Introducing Distributed Data-structures in R
[social4i size="large" align="float-right"] Indrajit Roy, Principal Researcher, Hewlett Packard Labs Due to R’s popularity as a data mining tool, many Big Data systems expose an R based interface to users. However, these interfaces are custom, non-standard, and difficult to learn. Earlier in the year, we hosted a workshop…
November 9, 2015
In "R guest posts"
The HP Workshop on Distributed Computing in R
by Joseph Rickert In the last week of January, HP Labs in Palo Alto hosted a workshop on distributed computing in R that was organized by Indrajit Roy (Principal Researcher, HP) and Michael Lawrence (Genentech and R-core member). The goal was to bring together a small group of R developers…
February 12, 2015
In "R bloggers"
A first look at Distributed R
by Joseph Rickert One of the most interesting R related presentations at last week’s Strata Hadoop World Conference in New York City was the session on Distributed R by Sunil Venkayala and Indrajit Roy, both of HP Labs. In short, Distributed R is an open source project with the end…
October 23, 2014
In "R bloggers"