Running the Same Task in Python and R

Posted on October 8, 2018 by John Mount in R bloggers | 0 Comments

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

According to a KDD poll fewer respondents (by rate) used only R in 2017 than in 2018. At the same time more respondents (by rate) used only Python in 2017 than in 2016.

Let’s take this as an excuse to take a quick look at what happens when we try a task in both systems.

For our task we picked the painful exercise of directly reading a 50,000,000 row by 50 column data set into memory on a machine with only 8GB of ram.

In Python the Pandas package takes around 6 minutes to read the data, and then one is ready to work.

In R both utils::read.csv() and readr::read_csv() fail with out of memory messages. So if your view of R is “base R only”, or “base R plus tidyverse only”, or “tidyverse only”: reading this file is a “hard task.”

With the above narrow view one would have no choice but to move to Python if one wants to get the job done.

Or, we could remember data.table. While data.table is obviously not part of the tidyverse, data.table has been a best-practice in R for around 12 years. It can read the data and is ready to work in R in under a minute.

In conclusion, to get things done in a pinch: learn Python or learn data.table. And, in my opinion, “tidyverse first teaching” (commonly code for “tidyverse only teaching”) may not serve the R community well in the long run.

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Running the Same Task in Python and R

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)