Snowfall
[This article was first published on Econometrics Beat: Dave Giles' Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Yesterday I had a short post reminding EViews users that their package (versions 7 or 8) will access all of the cores on a multi-core machine. I’ve been playing around with parallel processing in R on my desktop machine at work over the last few days. It’s something I’ve been meaning to do for a while, and it proved to be well worth the time.
Before I share my results with you, let me make a couple of comments.
First, parallel processing involves some costs in terms of communication overheads, so not all tasks are well-suited to this type of processing. It’s easy to generate examples that are computationally intensive, but execute faster on a single processor than on a cluster (of cores, or machines).
Second, even when a task is suitable for parallel processing, don’t expect the reduction in elapsed time to be linearly related to the increase in the number of cores. Remember, there are overheads involved!
Recently, there have been some posts out there that have illustrated the advantages of parallel processing in R. For example, WenSui Liu posted a piece describing some experiments run using the Ubuntu O/S. Also, Daniel Marcelino had a post that compared various “parallel” packages in R on a MacBook Pro. Nice choice of machine – it’s running UNIX beneath that pretty cover! And then, just as I was writing this post today, Arthur Charpentier came out with this related post, also based on results using a Mac.
However, none of these posts deal with a Windows environment, or the sorts of Monte Carlo or bootstrap simulations that econometricians use all of the time. So, I felt that there was something more to explore.
The first thing that I discovered, after a lot of digging around, is that although there’s a number of R packages to help with parallel processing, if you’re running Windows then your options are limited. O.K., that’s no surprise, of course! Don’t write comments saying that I should be using a different O/S if I want to engage in fast computing. I know that!
However, let’s stick with Windows. In that case it seems that the snowfall package for R is the best choice, currently. That’s what the results below are based on.
Before I share my results with you, let me make a couple of comments.
First, parallel processing involves some costs in terms of communication overheads, so not all tasks are well-suited to this type of processing. It’s easy to generate examples that are computationally intensive, but execute faster on a single processor than on a cluster (of cores, or machines).
Second, even when a task is suitable for parallel processing, don’t expect the reduction in elapsed time to be linearly related to the increase in the number of cores. Remember, there are overheads involved!
Recently, there have been some posts out there that have illustrated the advantages of parallel processing in R. For example, WenSui Liu posted a piece describing some experiments run using the Ubuntu O/S. Also, Daniel Marcelino had a post that compared various “parallel” packages in R on a MacBook Pro. Nice choice of machine – it’s running UNIX beneath that pretty cover! And then, just as I was writing this post today, Arthur Charpentier came out with this related post, also based on results using a Mac.
However, none of these posts deal with a Windows environment, or the sorts of Monte Carlo or bootstrap simulations that econometricians use all of the time. So, I felt that there was something more to explore.
The first thing that I discovered, after a lot of digging around, is that although there’s a number of R packages to help with parallel processing, if you’re running Windows then your options are limited. O.K., that’s no surprise, of course! Don’t write comments saying that I should be using a different O/S if I want to engage in fast computing. I know that!
However, let’s stick with Windows. In that case it seems that the snowfall package for R is the best choice, currently. That’s what the results below are based on.
Well, here are a couple of small examples, run on my DELL desktop. It has an Intel I7-3770 processor (8 4 cores + hyperthreading), and 12GB of RAM. I’m running Windows 7 (64 bit).
Test 1:
This test involves bootstrapping the sampling distribution of an OLS estimator. Of course, we know the answer – this is just an illustration of processing times!
This test involves bootstrapping the sampling distribution of an OLS estimator. Of course, we know the answer – this is just an illustration of processing times!
There are 9,999 replications. The R script is on the code page for this blog, and it’s a slightly modified version of an example given by Knaus et al. (2009).
Test 2:
This test involves a Monte Carlo simulation of the power of a paired t-test, using 1,999 replications, and sample sizes of n = 10 (5) 200. Again, the R script is on the code page for this blog, and it’s a modified version of an example given by Spector (undated)
Test 2:
This test involves a Monte Carlo simulation of the power of a paired t-test, using 1,999 replications, and sample sizes of n = 10 (5) 200. Again, the R script is on the code page for this blog, and it’s a modified version of an example given by Spector (undated)
The results when we allow R to access different numbers of cores are:
BTW – this is what I really enjoyed seeing – all of the cores on my machine running at full steam!
Of course, these processing times could be improved a lot by moving to an environment other than Windows! The point of the exercise, though, is simply to show you the effect of grabbing more cores when running a simulations of the type that we use a lot in econometrics.
References
Knaus, J., C. Porzelius, H. Binder, & G. Schwarzer, 2009. Easier parallel computing in R with snowfall and sfCluster. The R Journal, 1/1, 54-59.
Spector, P., undated. Using the snowfall library in R. Mimeo., Statistical Computing Facility, Department of Statistics, University of California, Berkeley.
References
Knaus, J., C. Porzelius, H. Binder, & G. Schwarzer, 2009. Easier parallel computing in R with snowfall and sfCluster. The R Journal, 1/1, 54-59.
Spector, P., undated. Using the snowfall library in R. Mimeo., Statistical Computing Facility, Department of Statistics, University of California, Berkeley.
© 2013, David E. Giles
To leave a comment for the author, please follow the link and comment on their blog: Econometrics Beat: Dave Giles' Blog.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.