RvsPython #6: LinkedIn has spoken!

[This article was first published on r – bensstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

Over the past while with my time on LinkedIn, I got to have exposure to many people from many different lines of work. I also managed to have carved a space for myself there where I can post about Data Science topics and share my blogs along the way. There have always been posts and polls comparing R and Python as well as the subsequent debates among users of the languages as far as which one is superior for doing Data Science. While these sort of arguments will never end and I am far from innocent of engaging in them, I chose to take to task understanding why Data Science practioners preferred one languge over another by “controlling” for exposure to the other language.

In this blog I am going to share my results from my LinkedIn polls comparing respondents preferences. The polls asked for respondents preferences for:

  1. Using dplyr in R vs pandas in Python for data wrangling,

  2. Using ggplot2 in R vs matplotlib and seaborn in Python for data visualization, and

  3. Using Jupyter notebooks vs RMarkdown for writing reports.

Disclaimer

This is by no means a formal study, its more of just me sharing my findings in blog form. Social media platforms come and go, but having a blog where I can share my findings (albeit less popular) offers a place where I can post my curated content. Likely due to LinkedIn’s algorithms, my first and second questions got more traction with over 132,000 views combined and over 1600 and 1300 votes respectively, while my last question only got a little more than 4000 views and over 106 votes at the time of writing.

To quote a comment on one of my polls:

With this in mind, lets share the results of these polls.

(Visuals were made with ggplot2 and the ggtech package for the theme)

1. dplyr vs pandas

As expected, most users who were pro-pandas never used dplyr before. However, when controlling for prior experience, it was pretty much a 50-50 split among respondents between using pandas in Python and dplyr in R. There were some comments recommending that I check out the data.table and dtplyr packages in R; while I don’t have much exposure to using those packages presently, I hope to check them out in the future.

For my closest experience to dplyr in Python, check out my review on the siuba module.

2. ggplot2 vs matplotlib and seaborn

In the case of comparing ggplot2 to matplotlib and seaborn among users who had experience with both packages, ggplot2 is preferred by 56% of users. Most users of matplotlib and seaborn don’t have experience with ggplot2 and vice-versa.

I was told to check out the plotly library which is compatible in R and Python and it really looks like a great library to have for building interactive dashboards and applications. While I don’t have much experience with it now, I do hope to check it out when time allows for it.

3. Using Jupyter notebooks vs RMarkdown for writing reports.

The results from this poll are questionable as I only got 106 replies to this poll. With this in mind these are the results:

Of users with experience with using both RMarkdown and Jupyter notebooks for writing their reports, 63% of users prefer using RMarkdown over Jupyter notebooks, however there are more users who have experienced Jupyter notebooks than RMarkdown.

Conclusion

With all being said, using dplyr in R or pandas Python for doing data wrangling seems like a toss up among users with experience with both languages. For data visualization, ggplot2 seems to be preferred over matplotlib or seaborn and if you trust the sample size, RMarkdown is preferred over Jupyter notebook among users with experience with both.

In general, apparent that R is still the underdog in terms of it being a language used for Data Science and programming- but by no means does that make me intend on stopping from using it any time soon.

When I get the time, I look forward to giving data.table and plotly a spin!

Thank you for reading!

To leave a comment for the author, please follow the link and comment on their blog: r – bensstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)