A Little Sampling Puzzle
[This article was first published on mickeymousemodels, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Suppose you have 10 objects from which you take a sample of size 20 (with replacement, or you’re in trouble). What’s the probability that each object was chosen at least once? Getting an answer via simulation is pleasantly easy:Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
f <- function(n=10, k=20) { x <- 1:n x.sample <- sample(x, size=k, replace=TRUE) return(length(unique(x.sample)) == n) } num.simulations <- 100000 table(replicate(num.simulations, f())) / num.simulations
You should see a number close to 0.215, which is confirmed by the analytic solution:
g <- function(i) { ((-1) ^ (i + 1)) * choose(10, i) * ((10 - i) / 10) ^ 20 } 1 - sum(sapply(1:9, g))
The second term is the probability that at least one object was not sampled. Enjoy!
To leave a comment for the author, please follow the link and comment on their blog: mickeymousemodels.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.