Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
One thing you always hear about R is how slow it is, especially when the code is not well vectorized or includes loops. But R is an interpreted language and its strong suit really isn’t speed but rather the comparative advantage is the 4,284 packages on CRAN. We accept the slower speed for the time saved from not having to re-invent the wheel every time we want to do something new.
But that doesn’t mean that it isn’t worth sometimes wondering how slow R is relative to other languages, especially with new tools like pandas in Python. I happened to be working on a Project Euler problem with the objective of calculating the first 10,001 prime numbers. I decided to see how R performed relative to my other primary languages of Python and C. I also wanted to see how R’s performance changed when I used apply()
and also the new(ish) compiler package.
I took the same basic approach to each language by writing a two functions. The first determines whether a number is prime or a composite by trial division with the set {2, 3, 5, …, round(sqrt(number))} and stopped when a trial division had mod 0 or when we had exhausted all possible divisors. The second function considered the odd numbers and counted the number of prime values. It returned the value of the supplied index. The code for C, Python and R (with and without use of sapply()
).
The results were most as expected:
time ./euler7 real 0m0.026s user 0m0.024s sys 0m0.000s time python euler7.py real 0m0.409s user 0m0.396s sys 0m0.004s time R CMD BATCH euler7.R real 0m7.058s user 0m6.268s sys 0m0.028s
C, the only compiled language, was really fast. It was nearly 16 times faster than Python and over 270 times faster than R. Relative to R, Python was a 17-fold performance increase. To paraphrase the SAT, C is to Python as Python is to R (for this problem).
What about using sapply()
and taking advantage of Rs functional programming? That was dreadful. Relative to the loops, using functional programing and sapply()
actually increased runtime to 10.470 seconds.
cmpfun()
reduced runtime to 2.408 seconds from the previous 7.058 and 10.470 seconds, respectively. While still much slower than Python or C, this represents a significant performance increase for R relative to its state just a year ago.
Maybe we won’t have to depend on the incredible packages on CRAN for our comparative advantage forever.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.