Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
set.seed()
, sample.split()
,createDataPartition()
, and createFolds()
functions. You may also find it helpful to go over subset()
function.
Answers to the exercises are available here.
If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.
Exercise 1
Load the iris data and also load the package “caTools”. If the package is not installed, then use install.packages
command to install it.
Exercise 2
Set the seed to 100
Exercise 3
use the function sample.split
with a SplitRatio=0.7
to split the dataset into two folds using the species class. store the results in the variable split
Exercise 4
use subset function to subset the dataframe where the split is True. Store this result in the variable called Train
Exercise 5
Store the other 30 percent of the sample in the variable Test
. Use the same subset method.
Exercise 6
Print out the number of rows in the Test and Train variables. You should see 70 percent of data in the Train and 30 percent in the Test.
Exercise 7
Install and load the library “caret”
Exercise 8
Set the seed to 500 and use the createDataPartition
to do the same 2 fold split as Q3 but with a 80:20 ratio with List=FALSE
Exercise 9
Use the createDataPartition
function to create 5 different samples of the training data.
Exercise 10
We know how to make 2 splits now and make 5 different samples. But what about 5 equal splits? Use the createFolds()
command to make 5 equal partitions of iris data-set. Make sure that each partitiion has an equal representation of the species class as much as possible.
Please help us to improve R-exercises:
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.