Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A question from X validated that took me quite a while to fathom and then the solution suddenly became quite obvious:
If a sample taken from an arbitrary distribution on {0,1}⁶ is censored from its (0,0,0,0,0,0) elements, and if the marginal probabilities are know for all six components of the random vector, what is an estimate of the proportion of (missing) (0,0,0,0,0,0) elements?
Since the censoring modifies all probabilities by the same renormalisation, i.e. divides them by the probability to be different from (0,0,0,0,0,0), ρ, this probability can be estimated by looking at the marginal probabilities to be equal to 1, which equal the original and known marginal probabilities divided by ρ. Here is a short R code illustrating the approach that I wrote in the taxi home yesterday night:
#generate vectors N=1e5 zprobs=c(.1,.9) #iid example smpl=matrix(sample(0:1,6*N,rep=TRUE,prob=zprobs),ncol=6) pty=apply(smpl,1,sum) smpl=smpl[pty>0,] ps=apply(smpl,2,mean) cor=mean(ps/rep(zprobs[2],6)) #estimated original size length(smpl[,1])*cor
A broader question is how many values (and which values) of the sample can be removed before this recovery gets impossible (with the same amount of information).
Filed under: Books, Kids, R Tagged: conditional probability, cross validated, mathematical puzzle, R
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.