Site icon R-bloggers

R: parallel processing using multicore package

[This article was first published on compBiomeBlog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I have been meaning to look at adding some parallel processing to R as I have some scripts that are painfully slow and embarrassingly parallel. There seem to be a lot of packages around for doing parallel computing, listed here.

I decided to look at multicore as it seemed easy to implement. The core of the package is the mclapply function, which is the multi core version of lapply. Basically you install the package,

install.packages(“multicore”)

load the library,

library(multicore)

then replace any instances of lapply in your code with mclapply it will speed up your code! Easy.

Obviously there are more complications than this and there are various options you can use, such as the number of cores to use etc.

To give a quick test:


test <- lapply(1:10,function(x) rnorm(10000))
system.time(x <- lapply(test,function(x) loess.smooth(x,x)))
#   user  system elapsed
#  0.954   0.246   2.795
system.time(x <- mclapply(test,function(x) loess.smooth(x,x)))
#   user  system elapsed
#  0.896   0.898   0.914

So the elapsed time went down from 2.795 to 0.914, which is about three times faster. Not bad.

The package also contains parallel and collect functions which allow you to run any processes in parallel, then collect will recover the results when they are all finished.

I have only just started using it, but first impressions are good. 

To leave a comment for the author, please follow the link and comment on their blog: compBiomeBlog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.