Beyond the basics of data.table: Smooth data exploration
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This exercise set provides practice using the fast and concise data.table
package. If you are new to the syntax it is recommended that you start by solving the set on the basics of data.table before attempting this one.
We will use data on used cars (Toyota Corollas) on sale during 2004 in the Netherlands. There are 1436 observations with information on the price at which it is offered for sale, age, mileage and more, see full variable description here.
Answers are available here.
Exercise 1
Load the data available to your working environment using fread()
, don’t forget to load the data.table
package first.
Exercise 2
Using one line of code print out the most common car model in the data, and the number of times it appears.
Exercise 3
Print out the mean and median price of the 10 most common models.
Exercise 4
Delete all columns that have Guarantee in its name.
Exercise 5
Add a new column which is the squared deviation of price from the average price of cars the same color.
Exercise 6
Use a combintation of .SDcols
and lapply
to get the mean value of columns 18 through 35
Exercise 7
Print the most common color by age in years?
Exercise 8
For the dummy variables in columns 18:35 recode 0 to -1. You might want to use the set
function here.
Exercise 9
Use the set
function to add “yuck!” to the varible Fuel_Type
if it is not petrol. Just because…
Exercise 10
Using .SDcols
and one command create two new variables, log of Weight
and Price
.
(Painting by José de Almada)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.