Speeding tickets for R and Stata

Murtaza Haider

11 years ago

[This article was first published on eKonometrics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

How fast is R? Is it as fast in executing routines as the other off-the-shelf software, such as Stata? After some comparative experimentation, I found Stata to be 5 to 8 times faster than R.

For me, speed has not been a concern in the past. I had used R with smaller datasets of roughly 5000 to 10,000 observations and found it to be as fast as other statistical software. More recently, I have been working with still a relatively small-sized data set of 63,122 observations. After realizing that R was very slow in executing the built-in routines for multinomial and ordinal logit models, I ran similar models in Stata with the same data set and found Stata to be much faster than R.

Before I go any further, I must confess that I did not try to determine ways to improve speed in R by, for instance, choosing faster converging algorithms. I hope readers would send me comments on how to speed-up execution for the routines I tested in R.

My data set comprised an ordinal dependant variable [5 categories] and categorical explanatory variables with 63,122 observations. I used a computer running Windows 7 on Intel Core 2 Quad CPU Q9300 @ 2.5 GHz with 8 GB of RAM. Further details about the test are listed in the following Table.

Software Routine	Stata 11 (duo core)	R (2.12.0)
Multinomial Logit	mlogit, 9.06 seconds	multinom, 50.59 seconds zelig (mlogit), 77.89 sec VGLM (multinomial), 64.4 sec
Proportional odds model	ologit, 1.69 sec	VGLM (parallel = T), 16.26 sec polr, 22.62 seconds
Generalized Logit	gologit2, 18.67 sec	VGLM (parallel = F), 64.71 sec

I first estimated the standard multinomial logit model in R using the multinom routine. R took almost 51 seconds to return the results. The subsequent call to summarise the model took another 52.29 seconds, thus making the total execution time in R to be 103 seconds. Surprised at the slow speed, I tried other options in R to estimate the same model. I first tested mlogit option in Zelig. The execution time was even slower at 78 seconds. I followed up with VGAM package, which returned a slightly better result with 64.4 seconds.

Other examples listed above suggest similar slower times for R in comparison with Stata.

What could be the reason for such an order of magnitude difference in speed between R and Stata. I unfortunately don’t have the answer. I do know that Revolution Analytics offers similar performance benchmark comparisons between their version of souped-up R (Revolution R) and the generic R. Revolution R was found to be five to eight times faster than regular R.

Other performance benchmarks revealed even greater speed differentials between Revolution R and the generic R.

There must be ways to make routines execute faster in R. A few weeks earlier, Professor John Fox ( a long-time contributor to R and the programmer of the R GUI, R Commander) delivered a guest lecture at the Ted Rogers School of Management in Toronto at the GTA R Users’ Group meeting. His talk focussed on how to program using binary logit model as an example. His code for binary logit was found to be much faster than the one that comes bundled with the GLM in R.

This makes me wonder: are there ways to make the generic R run faster?

To leave a comment for the author, please follow the link and comment on their blog: eKonometrics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Software Routine

Stata 11 (duo core)

R (2.12.0)