Site icon R-bloggers

Obstacles to performance in parallel programming

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Making your code run faster is often the primary goal when using parallel programming techniques in R, but sometimes the effort of converting your code to use a parallel framework leads only to disappointment, at least initially. Norman Matloff, author of Parallel Computing for Data Science: With Examples in R, C++ and CUDAhas shared chapter 2 of that book online, and it describes some of the issues that can lead to poor performance. They include:

  • Communications overhead, particularly an issue with fine-grained parallelism consisting of a very large number of relatively small tasks;
  • Load balance, where the computing resources aren't contributing equally to the problem;
  • Impacts from use of RAM and virtual memory, such as cache misses and page faults;
  • Network effects, such as latency and bandwidth, that impact performance and communication overhead;
  • Interprocess conflicts and thread scheduling; 
  • Data access and other I/O considerations.

The chapter is well worth a read for anyone writing parallel code in R (or indeed any programming language). It's also worth checking out Norm Matloff's keynote from the useR!2017 conference, embedded below.

Norm Matloff: Understanding overhead issues in parallel computation

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.