Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Hey guys, welcome back to my R-tips newsletter. Today, I’m excited to share with you the Top 10 R Packages for Exploratory Data Analysis (EDA). These packages will help you streamline your data analysis workflow and gain deeper insights into your datasets. Let’s dive in!
Table of Contents
Here’s what you’re learning today:
-
Importance of Exploratory Data Analysis
- Top 10 R Packages for EDA:
skimr
psych
corrplot
PerformanceAnalytics
GGally
DataExplorer
summarytools
SmartEDA
janitor
inspectdf
-
BONUS: 5 More Underrated EDA Libraries in R
- Get the Code: Join the R-Tips Newsletter to get the code and stay updated.
Get the Code (In the R-Tip 086 Folder)
SPECIAL ANNOUNCEMENT: ChatGPT for Data Scientists Workshop on October 23rd
Inside the workshop I’ll share how I built a Machine Learning Powered Production Shiny App with ChatGPT
(extends this data analysis to an insane production app):
What: ChatGPT for Data Scientists
When: Wednesday October 23rd, 2pm EST
How It Will Help You: Whether you are new to data science or are an expert, ChatGPT is changing the game. There’s a ton of hype. But how can ChatGPT actually help you become a better data scientist and help you stand out in your career? I’ll show you inside my free chatgpt for data scientists workshop.
Price: Does Free sound good?
How To Join: 👉 Register Here
R-Tips Weekly
This article is part of R-Tips Weekly, a weekly video tutorial that shows you step-by-step how to do common R coding tasks. Pretty cool, right?
Here are the links to get set up. 👇
This Tutorial is Available in Video (12-minutes)
I have a 12-minute video that walks you through these top 10 R packages for EDA and how to use them in R. (These are the ones I use most commonly) 👇
Importance of Exploratory Data Analysis
Exploratory Data Analysis (EDA) is a crucial step in any data science project. It helps you understand the underlying structure of your data, identify patterns, detect anomalies, and test hypotheses. EDA enables you to make informed decisions about data cleaning, feature selection, and model selection.
Top 10 R Packages for EDA
To make your EDA process more efficient and insightful, here are the top 10 R packages you should know. Get the R code and dataset so you can follow along here.
Setup the EDA Packages and Dataset in R:
First, make sure you install all of the R packages I’ll be demo-ing today. Then load the data set I’ll be using so you can reproduce the results. Run this code:
Get the Code (In the R-Tip 086 Folder)
1. skimr: Summary of the Dataset
skimr
provides a convenient and elegant summary of your data. Run this code:
- I made a deeper writeup on
skimr
: Get the deep-dive here.
Get the Code (In the R-Tip 086 Folder)
2. psych: Descriptive Statistics
The psych
package offers functions for psychological, psychometric, and personality research, including descriptive statistics. Run this code:
- We’ll use the
describe()
function. - I personally like to output tables, so optionally you can use
gt::gt()
to convert to a GT HTML table. (I made a deep dive on the GT R package here.)
Get the Code (In the R-Tip 086 Folder)
3. corrplot: Correlation Matrix Visualization
corrplot
visualizes correlation matrices using various correlation methods. There’s a ton of customizations you can do. Run this code:
Get the Code (In the R-Tip 086 Folder)
4. PerformanceAnalytics: Correlation Matrix with Scatterplots and Histograms
PerformanceAnalytics
provides advanced charts and statistical functions for financial analysis (I actually use PerformanceAnalytics inside my tidyquant
package for easier financial analysis). But, most people have no idea it has an amazing chart.Correlation()
function that is fast and awesome. Run this code:
Get the Code (In the R-Tip 086 Folder)
5. GGally: Scatterplot Matrix with Pairwise Relationships
GGally
extends ggplot2 by adding several functions to reduce the complexity of combining geometric objects. The ggpairs()
function is one of my favorite functions for assessing Pairwise Relationships. So powerful. Run this code:
Get the Code (In the R-Tip 086 Folder)
6. DataExplorer: Generate a Full EDA Report
DataExplorer
automates the EDA process and generates comprehensive reports. Run this code:
- I did a Deeper Dive on Data Explorer (Get my deep-dive here.)
Get the Code (In the R-Tip 086 Folder)
7. summarytools: Summary Table for the Dataset
summarytools
provides tools to neatly and quickly summarize data. Run this code:
- I did a deep dive on
summarytools
(Get the deep dive here.) - I’m a big fan of
gt
tables, so I convertedsummarytools
to gt (get that article here.)
Get the Code (In the R-Tip 086 Folder)
8. SmartEDA: Generate a Detailed EDA Report in HTML
SmartEDA
creates automated EDA reports with detailed analyses. This is a newer package, but already I love it. Run this code:
Get the Code (In the R-Tip 086 Folder)
9. janitor: Frequency Table for a Categorical Variable
janitor
helps with data cleaning tasks, including frequency tables. We’ll use tabyl()
to create a frequency table and the adorn_*
functions to modify the table. Run this code:
Get the Code (In the R-Tip 086 Folder)
10. inspectdf: Visualize Missing Values in the Dataset
inspectdf
provides tools to visualize data frames, including missing values and correlations. Run this code:
Get the Code (In the R-Tip 086 Folder)
Bonus: Five (5) Underrated EDA Libraries in R:
I had to call it quits at 10. But here are 4 more up and coming EDA libraries that are underrated:
-
Radiant: A shiny app for creating reproducible business and data analytics reports. Get my radiant deep dive here.
-
Correlationfunnel: I use this R package all the time for quick correlation anlaysis and detecting critical relationships. Full Disclosure: I authored this R package. (Get the introduction here.)
-
GWalkr: Like Tableau in R for $0. Get my GWalkR deep-dive here.
-
Esquisse: Also like Tableau in R for $0. Get my Esquisse deep-dive here.
-
Explore: A simple shiny app for quickly exploring data. Get my explore deep-dive here.
Want the Full R Code?
To get access to the full source code for this tutorial, subscribe to the R-Tips Newsletter. This code is available exclusively to subscribers!
Get the Code (In the R-Tip 086 Folder)
Conclusion: Enhance Your Data Analysis Workflow
By using these top 10 R packages for EDA, you can significantly enhance your exploratory data analysis workflow, gain deeper insights, and make data-driven decisions more effectively.
But there’s more to becoming a data scientist.
If you would like to grow your Business Data Science skills with R, then please read on…
Need to advance your business data science skills?
I’ve helped 6,107+ students learn data science for business from an elite business consultant’s perspective.
I’ve worked with Fortune 500 companies like S&P Global, Apple, MRM McCann, and more.
And I built a training program that gets my students life-changing data science careers (don’t believe me? see my testimonials here):
6-Figure Data Science Job at CVS Health ($125K)
Senior VP Of Analytics At JP Morgan ($200K)
50%+ Raises & Promotions ($150K)
Lead Data Scientist at Northwestern Mutual ($175K)
2X-ed Salary (From $60K to $120K)
2 Competing ML Job Offers ($150K)
Promotion to Lead Data Scientist ($175K)
Data Scientist Job at Verizon ($125K+)
Data Scientist Job at CitiBank ($100K + Bonus)
Whenever you are ready, here’s the system they are taking:
Here’s the system that has gotten aspiring data scientists, career transitioners, and life long learners data science jobs and promotions…
Join My 5-Course R-Track Program Now!
(And Become The Data Scientist You Were Meant To Be…)
P.S. – Samantha landed her NEW Data Science R Developer job at CVS Health (Fortune 500). This could be you.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.