Secret Santa
[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
On reddit somebody asked For n individuals, what’s the probability that the last person to pick during a round of Secret Santa name picking, will pick their own name.. “With each person picking in turn, and re-picking if they pick out their own name.‘ Two simulations were proposed. After some thinking I think it can be calculated rather easily.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Problem consideration
I think, it is possible to estimate the chance five people together draw any sequence. It is also possible to enumerate all sequences. Categorize the sequences, add their chances. Problem solved.
So, for example the sequence 2 5 1 3 4.
The first person has has 1/4 chance to select #2, as 2, 3, 4 and 5 can be selected with equal chances.
The second person has 1/4 chance to select #5, as 1, 3, 4 and 5 can be selected with equal chances.
The third person has 1/2 chance to select #1, as 1 or 4 can be selected.
The fourth person selects 3, as there is no alternative (4 results in a repick).
The fifth person selects 4, being the remaining number.
Chance to get this sequence: 1/32.
It is possible to recurse through all these trees, but why? After all we can easily enumerate all these sequences.
nn <-5
pp <- randtoolbox::permut(nn)
for(i in 1:(nn-1)) pp<- pp[!(pp[,i]==i),]
head(pp)
i i i
[1,] 5 4 1 3 2
[2,] 5 4 2 3 1
[3,] 5 4 1 2 3
[4,] 5 4 2 1 3
[5,] 5 3 4 1 2
[6,] 5 3 4 2 1
Then, the chances. For each column it is just 1/#possible_selections. That is one(!) statement.
la <- lapply(1:(nn-1),function(x) 1/rowSums(pp[,x:nn]!=x))
str(la)
List of 4
$ : num [1:53] 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 …
$ : num [1:53] 0.333 0.333 0.333 0.333 0.333 …
$ : num [1:53] 0.5 0.5 0.5 0.5 0.333 …
$ : num [1:53] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1 1 …
Having these chances, they must be multiplied
prob <- Reduce('*',la)
sum(prob)
[1] 1
Luckily they add to 1. So, what is the chance the last person draws a 5? 0.13, exactly the same as a simulation number found on reddit.
sum(prob[pp[,nn]==nn])
[1] 0.1319444
To get a bit more results, we can put it in a function and get the answers up to 9 persons:
secsan1 <- function(nn) {
pp <- randtoolbox::permut(nn)
for(i in 1:(nn-1)) pp<- pp[!(pp[,i]==i),]
la <- lapply(1:(nn-1),function(x) 1/rowSums(pp[,x:nn]!=x))
prob <- Reduce('*',la)
sum(prob[pp[,nn]==nn])
}
data.frame(n=3:9,sapply(3:9,secsan1))
n sapply.3.9..secsan1.
1 3 0.25000000
2 4 0.13888889
3 5 0.13194444
4 6 0.11277778
5 7 0.10053241
6 8 0.09049461
7 9 0.08238237
These numbers are close to one of the simulations.
The second simulation shows the chances the last person has to draw a person.
secsan2 <- function(nn) {
pp <- randtoolbox::permut(nn)
for(i in 1:(nn-1)) pp<- pp[!(pp[,i]==i),]
la <- lapply(1:(nn-1),function(x) 1/rowSums(pp[,x:nn]!=x))
prob <- Reduce('*',la)
xtabs(prob ~ pp[,nn])
}
as.data.frame(secsan2(8))
pp…nn. Freq
1 1 0.10485639
2 2 0.10729167
3 3 0.11072279
4 4 0.11591789
5 5 0.12471088
6 6 0.14283588
7 7 0.20316988
8 8 0.09049461
These numbers are not equal to simulation numbers on Reddit. This is because this simulation contains an error. It reads:
selsan <- function(who,persons) { if (length(persons)==1) return(persons) sel <- sample(persons[persons!=who],1) return(c(sel,selsan(who+1,persons[persons!=sel]))) } #selsan(1,1:5) finselsan <- function(n){ selsan(1,1:n)[n] } nrep=1e4 sa <- sapply(1:nrep,function(x) finselsan(8)) table(sa)/nrep
What happens is that sample() is pulling a trick on us. Most of the time sample() gets some values and selects one of those. It is possible however, that the second last person has also himself to draw. At which point sample thinks only one number is available. To quote the manual: If
x
has length 1, is numeric (in the sense of is.numeric
) and x >= 1
, sampling via sample
takes place from 1:x
. Note that this convenience feature may lead to undesired behaviour when x
is of varying length in calls such as sample(x)
. See the examples. It is even documented. But who reads these all the time? A corrected simulation avoids this:selsan <- function(who,persons) {
if (length(persons)==1) return(persons)
if (length(persons[persons!=who])==1) return(c(persons[persons!=who],who))
sel <- sample(persons[persons!=who],1)
return(c(sel,selsan(who+1,persons[persons!=sel])))
}
finselsan <- function(n){
selsan(1,1:n)[n]
}
nrep=1e4
sa <- sapply(1:nrep,function(x) finselsan(8))
as.data.frame(table(sa)/nrep)
sa Freq
1 1 0.1033
2 2 0.1024
3 3 0.1108
4 4 0.1123
5 5 0.1270
6 6 0.1498
7 7 0.2024
8 8 0.0920
This shows the exact calculation gives similar results as two simulations.
Extensions
What are the chances for any person to get anybody? It is simple now.
nn <- 6
pp <- randtoolbox::permut(nn)
for(i in 1:(nn-1)) pp<- pp[!(pp[,i]==i),]
la <- lapply(1:(nn-1),function(x) 1/rowSums(pp[,x:nn]!=x))
prob <- Reduce('*',la)
sapply(1:nn,function(x) xtabs(prob ~ factor(pp[,x],levels=1:nn)))
[,1] [,2] [,3] [,4] [,5] [,6]
1 0.0 0.24 0.2250000 0.2083333 0.1875000 0.1391667
2 0.2 0.00 0.2375000 0.2194444 0.1972222 0.1458333
3 0.2 0.19 0.0000000 0.2388889 0.2138889 0.1572222
4 0.2 0.19 0.1791667 0.0000000 0.2500000 0.1808333
5 0.2 0.19 0.1791667 0.1666667 0.0000000 0.2641667
6 0.2 0.19 0.1791667 0.1666667 0.1513889 0.1127778
The persons are columns, the rows are presents. Obviously 0 on the diagonal, except the last person. Closer to the last person, the chances get more unequal.
Conclusion
Don't do this. Get everybody to draw, then determine if a redraw is needed.
To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.