[This article was first published on Statistical Research » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Nothing novel or unique about this problem. This just extends the problem to measure the probability to three or more people sharing the same birthday using simulation approaches.
For two people it’s fairly straight forward and with a group of about 22 people the probability that two people share the same birthday is about 0.5. For groups approaching 50 there is an extremely high probability that two people share the same birthday
When determining that three (or more) people have the same birthday the probability decreases fairly quickly compared to measuring only two people. A fairly large group would be needed to find three people with the same birthday.
Here is some R code to determine these probabilities.
n.rep = 5000 theta.val = 75 doy = seq(from=1, to=365, by=1) sim.mat = matrix(NA, nrow=theta.val, ncol=4) getProb = function(n){ q = 1 - seq(0,n-1)/365 p = 1 - prod(q) } theta.list = seq(from=2, to=75, by=1) p.graph = sapply(theta.list, getProb) fifty.fifty = which(p.graph >.5)[1] plot(p.graph, main="Probability Two People Have the Same Birthday", ylab='Probability', xlab="Number of People in Group") lines(p.graph) abline(h=.5, v=fifty.fifty) ## For matching multiple people ## Runs a little slow. If I had more time I would find a more efficient way to write this. for(i in 2:theta.val){ bday = replicate(n.rep, sample(doy, size=i, replace=T) ) bday[1,] bday.table = apply(bday, 2, table) sim.2 = ifelse( unlist( lapply(bday.table, max) ) >=2, 1, 0) sim.3 = ifelse( unlist( lapply(bday.table, max) ) >=3, 1, 0) sim.4 = ifelse( unlist( lapply(bday.table, max) ) >=4, 1, 0) sim.mat[i,1] = i sim.mat[i,2] = sum(sim.2)/length(sim.2) sim.mat[i,3] = sum(sim.3)/length(sim.3) sim.mat[i,4] = sum(sim.4)/length(sim.4) } graph.sim = t( sim.mat[,2:4] ) colnames(graph.sim) = sim.mat[,1] barplot(graph.sim[1,], ylim=c(0,1), col="red", main="Probability of Having Multiple People with the Same Birthday", xlab="People with Birthday", ylab="Probability") barplot(graph.sim[2,], ylim=c(0,1), col="blue", add=T) barplot(graph.sim[3,], ylim=c(0,1), col="black", add=T) abline(h=.5) legend("topleft", c("2","3","4"), col=c("red","blue","black"), lwd=3)
To leave a comment for the author, please follow the link and comment on their blog: Statistical Research » R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.