Site icon R-bloggers

21+ Online Courses to Get Started Today with Data Cleaning

[This article was first published on R - Blendo Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Yeah… working with data sets means that you have a way to get them first. After you get them you have to clean them.
Data scientists spend 80% of their time in data cleaning and data manipulation and only 20% of their time actually analyzing it.
And then you find yourself spending 80% of your time to clean these data. At the same time, deadlines and management demands keep you up at night. This is one reason data analysts and data scientists regularly scour the web looking for anything that could help. Tools, tutorials, resources. I have stumbled many posts around related with general Data Science MOOC courses or tutorials. But never one that has a list of resources on one of the most time-consuming processes in the data pipeline. Data cleaning. In this post, I did my best to gather everything there is online. If you find a resource that I missed please let me know in the comments below. Let’s start with the basics…

What is data cleaning?

Data cleaning, data cleansing or data scrubbing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database. Source: Wikipedia1 Note: Some of the courses bellow belong to specializations or batches of courses. For example, Coursera has a Data Science specialization or Udacity’s Nanodegree Program but you may also take each course individually. If you are interested in a certificate, then usually there is a fee. If not (for Coursera at least) you may “audit” the course. Other courses are free and others are subscription based services.

Data Cleaning in R

Getting and Cleaning Data (Coursera)

Data Science and Machine Learning Essentials (edX)

Data Science with R (O’Reilly)

Cleaning Data in R (DataCamp)

Foundations of Data Science (Springboard)

Udemy Courses

You may want to take a look at the list of resources about Data cleaning and R inside Udemy. There are a lot to choose from, but it might require some searching to find which one is valuable to you.  

21+ Online Courses to Get Started Today with Data Cleaning #datascience #datacleaning https://t.co/GoOaHhNeeV pic.twitter.com/qxMVn4rWE6

— Blendo (@blendoapp) 26 May 2016

Data Cleaning in Python

Data Science and Machine Learning Essentials

See the Data Science and Machine Learning Essentials (edX) course above.

Intro to Data Analysis – Data Analysis Using NumPy and Pandas (Udacity)

Data Wrangling with MongoDB – Data Manipulation and Retrieval (Udacity)

Python for Data Analysis (Big Data University)

Intermediate Python and Pandas (DataQuest)

Data Analysis and Visualization (DataQuest)

Data Science Intensive (Springboard)

Big Data Science with BD2K-LINCS (Coursera)

Exploring CO2 Emissions Data using Pandas data frames in Python (Big Data University)

Python Applications (DataQuest)

Python for Business Analysts (DataQuest)

Udemy Courses

You may want to take a look at the list of resources about Data cleaning and Python inside Udemy. There are a lot to choose from, but it might require some searching to find which one is valuable to you. < !-- Begin MailChimp Signup Form -->  
< !-- real people should not fill this in and expect good things - do not remove this or risk form bot signups-->
< !--End mc_embed_signup-->

Data Cleaning (SQL, Spark etc.)

Introduction to Big Data Analytics (Coursera)

Working With Large Datasets (DataQuest)

Data Cleaning (OpenRefine, Tableau, Excel or other tools)

Introduction to OpenRefine (Big Data University)

How to clean your data (European Data Portal)

Data, Analytics and Learning (edX)

Data Analysis for your Business (edX)

Videos

When I was searching for this courses I stumbled upon some great videos from presentation in conferences. I added them here in case anybody is interested.

Closure

I hope this list will help anyone who is looking to clean her data or is looking for a smooth start with the subject of data wrangling. If you know any course that I missed or any of the above is not fitting for the list please let me know in the comments or Twitter bellow.
– or if you liked it until now you are more than welcome to share 🙂

References and some more links:

  1. Wikipedia – Data cleansing
  2. Databases by Stanford University
  3. Intro to Data CleaningTutorial by School of Data
  4. R for business users: data cleaningTutorial by Pluralsight
To leave a comment for the author, please follow the link and comment on their blog: R - Blendo Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.