Site icon R-bloggers

Pegging your multicore CPU in Revolution R, Good and Bad

[This article was first published on Nathan VanHoudnos » rstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I take an almost unhealthy pleasure in pushing my computer to its limits. This has become easier with Revolution R and its free license for academic use. One of its best features is debugger that allows you to step through R code interactively like you can with python on PyDev. The other useful thing it packages is a simple way to run embarrassingly parallel jobs on a multicore box with the doSMP package.

library(doSMP) # This declares how many processors to use. # Since I still wanted to use my laptop, during the simulation I chose cores-1. workers <- startWorkers(7) registerDoSMP(workers) # Make Revolution R not try to go multi-core since we're already explicitly running in parallel # Tip from: http://blog.revolutionanalytics.com/2010/06/performance-benefits-of-multithreaded-r.html setMKLthreads(1) chunkSize <- ceiling(runs / getDoParWorkers()) smpopts <- list(chunkSize=chunkSize) #This just let's me see how long the simulation ran beginTime <- Sys.time() #This is the crucial piece. It parallelizes a for loop among the workers and aggregates their results #with cbind. Since my function returns c(result1, result2, result3), r becomes a matrix with 3 rows and # "runs" columns. r <- foreach(icount(runs), .combine=cbind, .options.smp=smpopts) %dopar% { # repeatExperiment is just a wrapper function that returns a c(result1, result2, result3) tmp <- repeatExperiment(N,ratingsPerQuestion, minRatings, trials, cutoff, studentScores) } runTime <- Sys.time() - beginTime #So now I can do something like this: boxplot(r[1,], r[2,], r[3,], main=paste("Distribution of Percent of rmse below ", cutoff, "n Runs=", runs, " Trials=",trials, " Time=",round(runTime,2)," minsn", "scale: ",scaleLow,"-",scaleHigh, sep=""), names=c("Ave3","Ave5","Ave7"))

If you are intersested in finding out more of about this, their docs are pretty good.

The only drawback is that Revolution R is a bit rough around the edges and crashes much more than it should. Worse, for me at least the support forum doesn’t show any posts when I’m logged in and I can’t post anything. Although I’ve filled out (what I think is) the appropriate web-form no one has gotten back to me about fixing my account. I’m going to try twitter in a bit. Your mileage may vary.

Update: 6/9/2010 22:03 EST

Revolution Analytics responded to my support request after I mentioned it on twitter. Apparently, they had done something to the forums which corrupted my account. Creating a new account fixed the problem, so now I can report the bugs that I
find and get some help.

Update: 6/11/2010 16:03 EST

It turns out that you get a small speed improvement by setting setMKLthreads(1). Apparently, the libraries Revolution R links against attempt to use multiple cores by default. If you are explicitly parrallel programing, this means that your code is competing with itself for resources. Thanks for the tip!

To leave a comment for the author, please follow the link and comment on their blog: Nathan VanHoudnos » rstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.