Computing an Inner Product with RcppParallel
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The RcppParallel package includes
high level functions for doing parallel programming with Rcpp. For example,
the parallelReduce
function can be used aggreggate values from a set of
inputs in parallel. This article describes using RcppParallel to parallelize
the inner-product
example previously posted to the Rcpp Gallery.
Serial Version
First the serial version of computing the inner product. For this we use
a simple call to the STL std::inner_product
function:
#include <Rcpp.h> using namespace Rcpp; #include <algorithm> // [[Rcpp::export]] double innerProduct(NumericVector x, NumericVector y) { return std::inner_product(x.begin(), x.end(), y.begin(), 0.0); }
Parallel Version
Now we adapt our code to run in parallel. We’ll use the parallelReduce
function to do this. This function requires a “worker” function object
(defined below as InnerProduct
). For details on worker objects see the
parallel-vector-sum
article on the Rcpp Gallery.
// [[Rcpp::depends(RcppParallel)]] #include <RcppParallel.h> using namespace RcppParallel; struct InnerProduct : public Worker { // source vectors const RVector<double> x; const RVector<double> y; // product that I have accumulated double product; // constructors InnerProduct(const NumericVector x, const NumericVector y) : x(x), y(y), product(0) {} InnerProduct(const InnerProduct& innerProduct, Split) : x(innerProduct.x), y(innerProduct.y), product(0) {} // process just the elements of the range I've been asked to void operator()(std::size_t begin, std::size_t end) { product += std::inner_product(x.begin() + begin, x.begin() + end, y.begin() + begin, 0.0); } // join my value with that of another InnerProduct void join(const InnerProduct& rhs) { product += rhs.product; } };
Note that InnerProduct
derives from the RcppParallel::Worker
class. This
is required for function objects passed to parallelReduce
.
Note also that we use the RVector<double>
type for accessing the vector.
This is because this code will execute on a background thread where it’s not
safe to call R or Rcpp APIs. The RVector
class is included in the
RcppParallel package and provides a lightweight, thread-safe wrapper around R
vectors.
Now that we’ve defined the function object, implementing the parallel inner
product function is straightforward. Just initialize an instance of
InnerProduct
with the input vectors and call parallelReduce
:
// [[Rcpp::export]] double parallelInnerProduct(NumericVector x, NumericVector y) { // declare the InnerProduct instance that takes a pointer to the vector data InnerProduct innerProduct(x, y); // call paralleReduce to start the work parallelReduce(0, x.length(), innerProduct); // return the computed product return innerProduct.product; }
Benchmarks
A comparison of the performance of the two functions shows the parallel version performing about 2.5 times as fast on a machine with 4 cores:
x <- runif(1000000) y <- runif(1000000) library(rbenchmark) res <- benchmark(sum(x*y), innerProduct(x, y), parallelInnerProduct(x, y), order="relative") res[,1:4] test replications elapsed relative 3 parallelInnerProduct(x, y) 100 0.035 1.000 2 innerProduct(x, y) 100 0.088 2.514 1 sum(x * y) 100 0.283 8.086
Note that performance gains will typically be 30-50% less on Windows systems as a result of less sophisticated thread scheduling (RcppParallel does not currently use TBB on Windows whereas it does on the Mac and Linux).
You can learn more about using RcppParallel at https://github.com/RcppCore/RcppParallel.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.