Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Using the knowledge you acquired in the previous exercises on sampling and selecting(here), we will now go through an entire data analysis process. You will be using what you know as crutches to solve the problems. Don’t worry. It might look intimidating but follow the sequence and you will see that modeling a decision tree is the best decision you made today. We will take you through all stages of the data pipeline. From Data loading,feature selection, sampling, plotting, modelling and evaluating a decision tree.
Answers to the exercises are available here.
If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.
Exercise 1
Use read.csv() command to load the lenses.csv data and store it in lens. Use the str() command to see lens. Download the dataset from here
Exercise 2
Notice there are no column names. The column names are as follows
index, age, spec_pres, astigmatic, tpr. Use one line code to change the column names to the aforementioned names.
Exercise 3
Given the data
age: (1) young, (2) pre-presbyopic, (3) presbyopic
spec_pres: (1) myope, (2) hypermetrope
astigmatic: (1) no, (2) yes
tpr: (1) reduced, (2) normal
class: (1) patient needs hard contact lens, (2) patient needs soft contact lens, (3) patient does not need contact lens
Type the code lens$age[lens$age == "1"]="young"
Use the same format to change all the data to its names for the age and spec_pres variables.
Exercise 4
Use the str()
command to see the changes. Also notice that the astigmatic column is a factor that is also storing numbers as characters. To get all of them in the same format, lets convert it to character. Use the code as.character() to convert this column data type to character.
Exercise 5
Now change the astigmatic column data to the right names
Exercise 6
Use the following code to replace the 1 with “reduced in the tpr column
lens$tpr[lens$tpr==1]="reduced"
Now type str(lens) to see the dataframe. Notice that the tpr column data type change to character from integer. Anytime you introduce something that is not a number in a number dataframe, it will become a character.
Exercise 7
Go ahead and replace 2 in the tpr column with “normal”
Exercise 8
use the table()
command to see the counts of each data type
Exercise 9
Notice that there is a g in the count. That could possibly be a typo. We can go ahead and remove that row since there is only one row with that typo. Hint: You can select all rows that does not have that typo and store it back in the lens dataframe.
Exercise 10
Great Work. We realized that the index column is not necessary for our modeling purposes. So lets remove the index column.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.