Beyond the basics of data.table: Smooth data exploration

Posted on September 5, 2017 by sindri in R bloggers | 0 Comments

[This article was first published on R-exercises, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This exercise set provides practice using the fast and concise data.table package. If you are new to the syntax it is recommended that you start by solving the set on the basics of data.table before attempting this one.

We will use data on used cars (Toyota Corollas) on sale during 2004 in the Netherlands. There are 1436 observations with information on the price at which it is offered for sale, age, mileage and more, see full variable description here.

Answers are available here.

Exercise 1
Load the data available to your working environment using fread(), don’t forget to load the data.table package first.

Exercise 2
Using one line of code print out the most common car model in the data, and the number of times it appears.

Exercise 3
Print out the mean and median price of the 10 most common models.

Exercise 4
Delete all columns that have Guarantee in its name.

Exercise 5
Add a new column which is the squared deviation of price from the average price of cars the same color.

Exercise 6
Use a combintation of .SDcols and lapply to get the mean value of columns 18 through 35

Exercise 7
Print the most common color by age in years?

Learn more about the data.table package in the online course R Data Pre-Processing & Data Management – Shape your Data!. In this course you will learn how to

work with different data manipulation packages,
know how to import, transform and prepare your dataset for modelling,
and much more.

Exercise 8
For the dummy variables in columns 18:35 recode 0 to -1. You might want to use the set function here.

Exercise 9
Use the set function to add “yuck!” to the varible Fuel_Type if it is not petrol. Just because…

Exercise 10
Using .SDcols and one command create two new variables, log of Weight and Price.

(Painting by José de Almada)

To leave a comment for the author, please follow the link and comment on their blog: R-exercises.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Beyond the basics of data.table: Smooth data exploration

Related

Related exercise sets:

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)