Site icon R-bloggers

Quick and dirty parallel processing in R

[This article was first published on Stat Bandit » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R has some powerful tools for parallel processing, which I discovered while searching for ways to fully utilize my 8-core computer at work. What surprised me is how easy it is…about 6 lines of code, if that. Given that I wasn’t allowed to install heavy duty parallel-processing systems like MPICH on the computer, I found that the library SNOW fit the bill nicely through its use of sockets. I also discovered the libraries foreach and iterators, which were released to the community by the development team at Revolution R. Using these 3 libraries, I could easily parallelize a transformation of my dataset where the transformations happened within each unique ID. The following code did the trick: library(foreach) library(doSNOW) cl <- makeCluster(6, type="SOCK") # using 6 nodes registerDoSNOW(cl) uID <- unique(ID) foreach(i=icount(length(uID)) %dopar% {     transformData(dat[dat$ID==uID[i],]) } stopCluster(cl) Note that this is for a multiprocessor single computer. Doing this on a cluster may be more complicated, but this serves my purposes quite nicely. There are other choices for this, including the multicore library and others described in the CRAN Task View

Update: I found that this strategy did not work for R 2.11 Windows versions, since snow is not properly spawning processes. However, there is a library doSMP provided by Revolution Analytics which gets around this problem. So replacing doSNOW with doSMP should do the trick.


To leave a comment for the author, please follow the link and comment on their blog: Stat Bandit » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.