Super-FAST EDA in R with DataExplorer
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This article is part of a R-Tips Weekly, a weekly video tutorial that shows you step-by-step how to do common R coding tasks.
Did you know most Data Scientists spend 80% of their time just trying to understand and prepare data for analysis?! This process is called Exploratory Data Analysis (EDA). R has an Insane EDA productivity-enhancer. It’s called DataExplorer
.
Here are the links to get set up. ????
Use DataExplorer for EDA
Exploratory Data Analysis
You’re making this DataExplorer EDA Report!
Super-FAST Exploratory Data Analysis (EDA) in R
In this weekly R-Tip, we’re making an “EDA Report”, created with the DataExplorer R package. The DataExplorer Package is an excellent package for Exploratory Data Analysis. In fact, it’s one of my top 3 EDA Packages.
PRO TIP: I’ve added EDA on Page 3 of my Ultimate R Cheatsheet. ????
As you follow along, you can use my Ultimate R Cheatsheet. It consolidates the most important R packages (ones I use every day) into one cheatsheet.
EDA Report with Data Explorer
Automatic Exploratory Reporting
One of the coolest features of DataExplorer is the ability to create an EDA Report in 1 line of code. This automates:
- Basic Statistics
- Data Structure
- Missing Data Profiling
- Continuous and Categorical Distribution Profiling (Histograms, Bar Charts)
- Relationships (Correlation)
Ultimately, this saves the analyst/data scientist SO MUCH TIME. ????
DataExplorer EDA Plots
Add the important DataExplorer report plots to your R-Code
DataExplorer just makes EVERYTHING SO EASY. Here’s an example of the output of plot_correlations()
. In one line of code, we get a correlation heatmap correlation heatmap with categorical data dummied.
It gets better. Everything is one line of code:
plot_intro()
: Plots the introduction to the datasetplot_missing()
: Plots the missing dataplot_density()
andplot_histogram()
: Plots the continuous feature distributions.plot_bar()
: Plots bar charts for categorical distributionsplot_correlation()
: Plots relationships
Here’s the output of plot_bar()
. Wow – DataExplorer makes it that easy to make TIME-SAVING EDA VISUALIZATIONS.
You don’t need to be Bruce Almighty to do EDA fast anymore.
Just.Use.DataExplorer.
???? Top R-Tips Tutorials you might like:
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.