Comparing performance in R, foreach/doSNOW, SAS, and NumPY (MKL)
[This article was first published on Adventures in Statistical Computing, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This is a follow up to my previous post. There is a quicker way to compute the function I created (basic cumulative sum) in R.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Instead of:
Use this:function f(x) {sum = 0;for (i in seq(1,x)) sum = sum + ireturn(sum)}
If I time it, we see:f2 = function(x){return(sum(seq(x)))}
Nice! Spread that across 3 CPUs and we can bring it down a bit:system.time( (out = apply(as.array(seq(10000)),1,f2)))user system elapsed0.35 0.05 0.39
Not too shabby. How fast can we do this in SAS:system.time( (out2 = foreach(i=seq(0,9),.combine=’c’) %dopar% {apply(as.array(seq(i*1000+1,(i+1)*1000)),1,f2)}))user system elapsed0.02 0.00 0.26
SAS on a single CPU is just as fast as R on 3. It’s not worth attempting to multi-thread this in SAS. The overhead would be too much as SAS/CONNECT is made for bigger problems.optionscmplib=work.fns;procfcmp outlib=work.fns.fns;function csum(x);sum = 0;do i=1to x;sum = sum+i;end;return (sum);endsub;run;data_null_;doi=1 to 10000;x = csum(i);end;run;NOTE: DATA statement used (Total process time):real time 0.24 secondscpu time 0.25 seconds
So what about NumPY in Python? If we use the version compiled with MKL we ought to be able to do reduction in blazing fast time. MKL should use the SSE registers on the processor. Further, we’ll use the “fromfunction” method that lets us pass a lambda to the array creation method.
import numpy as npimport time as timedef f(x,y):x = x +1return(np.cumsum(x))s = time.time()y = np.fromfunction(f,(10000,1))el = time.time() – sprint “%0.6f” % el
0.002000
To leave a comment for the author, please follow the link and comment on their blog: Adventures in Statistical Computing.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.