Top 10 R Packages for Exploratory Data Analysis (EDA) (Bookmark this!)

[This article was first published on business-science.io, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Hey guys, welcome back to my R-tips newsletter. Today, I’m excited to share with you the Top 10 R Packages for Exploratory Data Analysis (EDA). These packages will help you streamline your data analysis workflow and gain deeper insights into your datasets. Let’s dive in!

Table of Contents

Here’s what you’re learning today:

  • Importance of Exploratory Data Analysis

  • Top 10 R Packages for EDA:
    • skimr
    • psych
    • corrplot
    • PerformanceAnalytics
    • GGally
    • DataExplorer
    • summarytools
    • SmartEDA
    • janitor
    • inspectdf
  • BONUS: 5 More Underrated EDA Libraries in R

  • Get the Code: Join the R-Tips Newsletter to get the code and stay updated.

Analyze Your Data Faster with gt_summarytools()

Get the Code (In the R-Tip 086 Folder)


SPECIAL ANNOUNCEMENT: ChatGPT for Data Scientists Workshop on October 23rd

Inside the workshop I’ll share how I built a Machine Learning Powered Production Shiny App with ChatGPT (extends this data analysis to an insane production app):

ChatGPT for Data Scientists

What: ChatGPT for Data Scientists

When: Wednesday October 23rd, 2pm EST

How It Will Help You: Whether you are new to data science or are an expert, ChatGPT is changing the game. There’s a ton of hype. But how can ChatGPT actually help you become a better data scientist and help you stand out in your career? I’ll show you inside my free chatgpt for data scientists workshop.

Price: Does Free sound good?

How To Join: 👉 Register Here


R-Tips Weekly

This article is part of R-Tips Weekly, a weekly video tutorial that shows you step-by-step how to do common R coding tasks. Pretty cool, right?

Here are the links to get set up. 👇

This Tutorial is Available in Video (12-minutes)

I have a 12-minute video that walks you through these top 10 R packages for EDA and how to use them in R. (These are the ones I use most commonly) 👇

Importance of Exploratory Data Analysis

Exploratory Data Analysis (EDA) is a crucial step in any data science project. It helps you understand the underlying structure of your data, identify patterns, detect anomalies, and test hypotheses. EDA enables you to make informed decisions about data cleaning, feature selection, and model selection.

Top 10 R Packages for EDA

To make your EDA process more efficient and insightful, here are the top 10 R packages you should know. Get the R code and dataset so you can follow along here.

Setup the EDA Packages and Dataset in R:

First, make sure you install all of the R packages I’ll be demo-ing today. Then load the data set I’ll be using so you can reproduce the results. Run this code:

Libraries and Data

Get the Code (In the R-Tip 086 Folder)

1. skimr: Summary of the Dataset

skimr provides a convenient and elegant summary of your data. Run this code:

skimr summary of dataset

Get the Code (In the R-Tip 086 Folder)

2. psych: Descriptive Statistics

The psych package offers functions for psychological, psychometric, and personality research, including descriptive statistics. Run this code:

Get Descriptive Statistics with Psych

Get the Code (In the R-Tip 086 Folder)

3. corrplot: Correlation Matrix Visualization

corrplot visualizes correlation matrices using various correlation methods. There’s a ton of customizations you can do. Run this code:

Correlation Matrix Visualization with Corrplot

Get the Code (In the R-Tip 086 Folder)

4. PerformanceAnalytics: Correlation Matrix with Scatterplots and Histograms

PerformanceAnalytics provides advanced charts and statistical functions for financial analysis (I actually use PerformanceAnalytics inside my tidyquant package for easier financial analysis). But, most people have no idea it has an amazing chart.Correlation() function that is fast and awesome. Run this code:

PeformanceAnalytics Chart Correlation

Get the Code (In the R-Tip 086 Folder)

5. GGally: Scatterplot Matrix with Pairwise Relationships

GGally extends ggplot2 by adding several functions to reduce the complexity of combining geometric objects. The ggpairs() function is one of my favorite functions for assessing Pairwise Relationships. So powerful. Run this code:

GGally Pairwise Relationships

Get the Code (In the R-Tip 086 Folder)

6. DataExplorer: Generate a Full EDA Report

DataExplorer automates the EDA process and generates comprehensive reports. Run this code:

DataExplorer

Get the Code (In the R-Tip 086 Folder)

7. summarytools: Summary Table for the Dataset

summarytools provides tools to neatly and quickly summarize data. Run this code:

Summarytools

Get the Code (In the R-Tip 086 Folder)

8. SmartEDA: Generate a Detailed EDA Report in HTML

SmartEDA creates automated EDA reports with detailed analyses. This is a newer package, but already I love it. Run this code:

SmartEDA

Get the Code (In the R-Tip 086 Folder)

9. janitor: Frequency Table for a Categorical Variable

janitor helps with data cleaning tasks, including frequency tables. We’ll use tabyl() to create a frequency table and the adorn_* functions to modify the table. Run this code:

Janitor Tabyl

Get the Code (In the R-Tip 086 Folder)

10. inspectdf: Visualize Missing Values in the Dataset

inspectdf provides tools to visualize data frames, including missing values and correlations. Run this code:

InspectDF

Get the Code (In the R-Tip 086 Folder)

Bonus: Five (5) Underrated EDA Libraries in R:

I had to call it quits at 10. But here are 4 more up and coming EDA libraries that are underrated:

  1. Radiant: A shiny app for creating reproducible business and data analytics reports. Get my radiant deep dive here.

  2. Correlationfunnel: I use this R package all the time for quick correlation anlaysis and detecting critical relationships. Full Disclosure: I authored this R package. (Get the introduction here.)

  3. GWalkr: Like Tableau in R for $0. Get my GWalkR deep-dive here.

  4. Esquisse: Also like Tableau in R for $0. Get my Esquisse deep-dive here.

  5. Explore: A simple shiny app for quickly exploring data. Get my explore deep-dive here.

Want the Full R Code?

To get access to the full source code for this tutorial, subscribe to the R-Tips Newsletter. This code is available exclusively to subscribers!

Get the Code (In the R-Tip 086 Folder)

Conclusion: Enhance Your Data Analysis Workflow

By using these top 10 R packages for EDA, you can significantly enhance your exploratory data analysis workflow, gain deeper insights, and make data-driven decisions more effectively.

But there’s more to becoming a data scientist.

If you would like to grow your Business Data Science skills with R, then please read on…

Need to advance your business data science skills?

I’ve helped 6,107+ students learn data science for business from an elite business consultant’s perspective.

I’ve worked with Fortune 500 companies like S&P Global, Apple, MRM McCann, and more.

And I built a training program that gets my students life-changing data science careers (don’t believe me? see my testimonials here):

6-Figure Data Science Job at CVS Health ($125K)
Senior VP Of Analytics At JP Morgan ($200K)
50%+ Raises & Promotions ($150K)
Lead Data Scientist at Northwestern Mutual ($175K)
2X-ed Salary (From $60K to $120K)
2 Competing ML Job Offers ($150K)
Promotion to Lead Data Scientist ($175K)
Data Scientist Job at Verizon ($125K+)
Data Scientist Job at CitiBank ($100K + Bonus)

Whenever you are ready, here’s the system they are taking:

Here’s the system that has gotten aspiring data scientists, career transitioners, and life long learners data science jobs and promotions…

What They're Doing - 5 Course R-Track

Join My 5-Course R-Track Program Now!
(And Become The Data Scientist You Were Meant To Be…)

P.S. – Samantha landed her NEW Data Science R Developer job at CVS Health (Fortune 500). This could be you.

Success Samantha Got The Job

To leave a comment for the author, please follow the link and comment on their blog: business-science.io.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)