Creating Sample Datasets – Exercises
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Creating sample data is a common task performed in many different scenarios.
R has several base functions that make the sampling process quite easy and fast.
Below is an explanation of the main functions used in the current set of exercices:
1. set.seed() – Although R executes a random mechanism of sample creation, set.seed() function allows us to reproduce the exact sample each time we execute a random-related function.
2. sample() – Sampling function. The arguments of the function are:
x – a vector of values,
size – sample size
replace – Either use a chosen value more than once or not
prob – the probabilities of each value in the input vector.
3. seq()/seq.Date() – Create a sequence of values/dates, ranging from a ‘start’ to an ‘end’ value.
4. rep() – Repeat a value/vector n times.
5. rev() – Revert the values within a vector.
You can get additional explanations for those functions by adding a ‘?’ prior to each function’s name.
Answers to the exercises are available here.
If you have different solutions, feel free to post them.
Exercise 1
1. Set seed with value 1235
2. Create a Bernoulli sample of 100 ‘fair coin’ flippings.
Populate a variable called fair_coin
with the sample results.
Exercise 2
1. Set seed with value 2312
2. Create a sample of 10 integers, based on a vector ranging from 8 thru 19.
Allow the sample to have repeated values.
Populate a variable called hourselect1
with the sample results
Exercise 3
1. Create a vector variable called probs
with the following probabilities:
‘0.05,0.08,0.16,0.17,0.18,0.14,0.08,0.06,0.03,0.03,0.01,0.01’
2. Make sure the sum of the vector equals 1.
Exercise 4
1. Set seed with value 1976
2. Create a sample of 10 integers, based on a vector ranging from 8 thru 19.
Allow the sample to have repeated values and use the probabilities defined in the previous question.
Populate a variable called hourselect2
with the sample results
Exercise 5
Let’s prepare the variables for a biased coin:
1. Populate a variable called coin
with 5 zeros in a row and 5 ones in a row
2. Populate a variable called probs
having 5 times value ‘0.08’ in a row and 5 times value ‘0.12’ in a row.
3. Make sure the sum of probabilities on probs
variable equals 1.
Exercise 6
1. Set seed with value 345124
2. Create a biased sample of length 100, having as input the coin
vector, and as probabilities probs
vector of probabilities.
Populate a variable called biased_coin
with the sample results.
Exercise 7
Compare the sum of values in fair_coin
and biased_coin
Exercise 8
1. Create a ‘Date’ variable called startDate
with value 9th of February 2010 and a second ‘Date’ variable called endDate
with value 9th of February 2005
2. Create a descending sequence of dates having all 9th’s of the month between those two dates. Populate a variable called seqDates
with the sequence of dates.
Exercise 9
Revert the sequence of dates created in the previous question, so they are in ascending order and place them in a variable called RevSeqDates
Exercise 10
1. Set seed with value 10
2. Create a sample of 20 unique values from the RevSeqDates vector.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.