[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Last week I wrote:Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This is actually a more difficult calculation (or I forgot too much probability). Luckily a bit of brute force comes in handy. To reiterate, in general simulated data shows 0.54 redraws because of the first person etc.
colSums(countstop)/nrep
[1] 0.54276 0.40735 0.31383 0.24834 0.20415
In this last week I figured out how to finish this more elegantly/without too much brute force.
To reiterate, the chance within a particular round of drawing names are:
(p.onedraw <- c(colSums(redraw)/nrow(redraw),p.succes) )
[1] 0.200 0.150 0.117 0.0917 0.075 0.367
This means that a ’round’ of secret santa with 5 persons can have 6 outcomes: The first person draws self, the second draws self, same for person three , four and five. Sixth outcome: a valid drawing of names. If a person self draws, a next round occurs. This means an infinite number of rounds and drawings may occur. The brute force solution was to follow a lot of solutions and do this again and again. But, we can calculate the probability of of a finish after twice a fail at person 1 and three times fail at person 3. It is the chance of this happening: 0.22*0.1173*0.367 times the number of ways this can happen: (2+3)!/(2!*3!). Obviously, this is a generalization of some of the formulas for binomial.
Now, to do this for all reasonable possibilities of results. The latter may be obtained e.g. by
ncalc <- 8xx <- expand.grid(0:ncalc,0:ncalc,0:ncalc,0:ncalc,0:ncalc)nrow(xx)
[1] 59049
Implementation phase 1
There are quite a number of probabilities to calculate, and as this will be wholesale, every shortcut is welcome. The main approach is to do the whole calculation on a logarithmic scale and transform back at the end.
With a bit of preparation log(factorial) is just getting a number out of a vector.
lnumb <- c(0,log(1:40))
clnumb <- cumsum(lnumb)
lfac <- function(x) clnumb[x+1]
exp(lfac(4))
[1] 24
to store logs for chances:
lp1d <- log(p.onedraw)
Combine it all into functions
lpoccur <- function(x,lpvals) {
x <- as.numeric(x)
crossprod(c(x,1),lpvals) + lfac(sum(x))-sum(sapply(x,lfac))
}
poccur <- function(x,lpvals) exp(lpoccur(x,lpvals))
And call it over the matrix xx
chances <- sapply(1:nrow(xx),function(x) poccur(xx[x,],lp1d))
newcalc.res <- as.data.frame(cbind(xx,chances))
newcalc.res <- newcalc.res[order(-newcalc.res$chances),]
head(newcalc.res)
Var1 Var2 Var3 Var4 Var5 chances
1 0 0 0 0 0 0.36666667
2 1 0 0 0 0 0.07333333
10 0 1 0 0 0 0.05500000
82 0 0 1 0 0 0.04277778
730 0 0 0 1 0 0.03361111
6562 0 0 0 0 1 0.02750000
Implementation phase 2
The idea was to not store matrix xx, but rather generate the rows on the fly, because I thought xx would be too large. But is seems that is not needed and a suitable xx fits into memory.
sum(chances)
[1] 0.9998979
Hence the calculation can be vectorised a bit more, reducing calculation time again:
ntrial <- rowSums(xx)
accu <- lfac(ntrial) + crossprod(t(as.matrix(xx)),lp1d[1:(length(lp1d)-1)])
for (i in 1:ncol(xx) ) accu <- accu-lfac(xx[,i])
accu <- accu+lp1d[length(lp1d)]
chance2 <- exp(accu)
newcalc2.res <- cbind(xx,chance2)
newcalc2.res <- newcalc2.res[order(-newcalc2.res$chance2),]
So, the values from simulation are recreated
sum(newcalc2.res$Var1*newcalc2.res$chance2)
[1] 0.5445788
sum(newcalc2.res$Var5*newcalc2.res$chance2)
[1] 0.2044005
Calculation done! In the end it was not so much a difficult or long calculation. However, a simulation is much easy to construct and clearly pointed out directions.
To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.