Descriptive Analytics – Part 0 : data exploration
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Descriptive Analytics is the examination of data or content, usually manually performed, to answer the question “What happened?”.
This is the first set of exercise of a series of exercises that aims to provide a descriptive analytics solution to the ‘2008’ data set from here. Download it and save it as a csv file. This data set which contains the arrival and departure information for all domestic flights in the US from 2008 has become the “iris” data set for Big Data. In the exercises below we cover the basics of data exploration. I chose it to be the ‘part 0’ of the descriptive analytics solution, because in order to proceed to the data pre-processing and then description you need to get to know your data set while it is not formally on the value chain of descriptive analytics process. Before proceeding, it might be helpful to look over the help pages for the str
, summary
, dim
, nrow
, ncol
, names
, is.na
, match
functions.
Answers to the exercises are available here.
If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.
Load the data before proceeding. Let’s name the dataset as ‘flights’
flights <- read.csv('2008.csv')
Exercise 1
Print the structure of the data. What do you think about it?
Exercise 2
Print the summary statistics of the data. What do you think about the values? (format, consistency, completeness)
Exercise 3
Print the dimensionality of the data (number of rows and columns)
Exercise 4
Print the number of rows. This may seem like a silly command, but it is quite useful for loops and if statements.
Exercise 5
Print the number of columns.
Exercise 6
Print the names of the variables.
Exercise 7
Print whether the first column has missing values (NAs). Try to answer this question with two ways. Hint: %in%
Exercise 8
Print the number of variables that contain missing values.
Exercise 9
Find the portion of the variables that contain missing values. What do you think about it?
Exercise 10
Print the names of the variables that contain missing values.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.