Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
Over the past while with my time on LinkedIn, I got to have exposure to many people from many different lines of work. I also managed to have carved a space for myself there where I can post about Data Science topics and share my blogs along the way. There have always been posts and polls comparing R and Python as well as the subsequent debates among users of the languages as far as which one is superior for doing Data Science. While these sort of arguments will never end and I am far from innocent of engaging in them, I chose to take to task understanding why Data Science practioners preferred one languge over another by “controlling” for exposure to the other language.
In this blog I am going to share my results from my LinkedIn polls comparing respondents preferences. The polls asked for respondents preferences for:
-
Using
dplyr
in R vspandas
in Python for data wrangling, -
Using
ggplot2
in R vsmatplotlib
andseaborn
in Python for data visualization, and -
Using Jupyter notebooks vs RMarkdown for writing reports.
Disclaimer
This is by no means a formal study, its more of just me sharing my findings in blog form. Social media platforms come and go, but having a blog where I can share my findings (albeit less popular) offers a place where I can post my curated content. Likely due to LinkedIn’s algorithms, my first and second questions got more traction with over 132,000 views combined and over 1600 and 1300 votes respectively, while my last question only got a little more than 4000 views and over 106 votes at the time of writing.
To quote a comment on one of my polls:
With this in mind, lets share the results of these polls.
(Visuals were made with ggplot2
and the ggtech
package for the theme)
1. dplyr vs pandas
As expected, most users who were pro-pandas never used dplyr
before. However, when controlling for prior experience, it was pretty much a 50-50 split among respondents between using pandas
in Python and dplyr
in R. There were some comments recommending that I check out the data.table
and dtplyr
packages in R; while I don’t have much exposure to using those packages presently, I hope to check them out in the future.
For my closest experience to dplyr
in Python, check out my review on the siuba
module.
2. ggplot2 vs matplotlib and seaborn
In the case of comparing ggplot2
to matplotlib
and seaborn
among users who had experience with both packages, ggplot2
is preferred by 56% of users. Most users of matplotlib
and seaborn
don’t have experience with ggplot2
and vice-versa.
I was told to check out the plotly
library which is compatible in R and Python and it really looks like a great library to have for building interactive dashboards and applications. While I don’t have much experience with it now, I do hope to check it out when time allows for it.
3. Using Jupyter notebooks vs RMarkdown for writing reports.
The results from this poll are questionable as I only got 106 replies to this poll. With this in mind these are the results:
Of users with experience with using both RMarkdown and Jupyter notebooks for writing their reports, 63% of users prefer using RMarkdown over Jupyter notebooks, however there are more users who have experienced Jupyter notebooks than RMarkdown.
Conclusion
With all being said, using dplyr
in R or pandas
Python for doing data wrangling seems like a toss up among users with experience with both languages. For data visualization, ggplot2
seems to be preferred over matplotlib
or seaborn
and if you trust the sample size, RMarkdown is preferred over Jupyter notebook among users with experience with both.
In general, apparent that R is still the underdog in terms of it being a language used for Data Science and programming- but by no means does that make me intend on stopping from using it any time soon.
When I get the time, I look forward to giving data.table
and plotly
a spin!
Thank you for reading!
Want to see more of my content?
Be sure to subscribe and never miss an update!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.