Site icon R-bloggers

How to pick up 3 numbers from a uniform distribution in a transparent manner?

[This article was first published on Christophe Ladroue » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Over in my previous post, I’m giving away 3 copies of my video course on ggplot2 and shiny. To win a copy, you just need to leave a comment and I will select 3 winners among the n participants at random after a deadline.

But how do I pick 3 winners such that:

The first thing that comes to mind is to run sample(n,3, replace = FALSE) and report the winners. But how do you know I actually ran this code? I could have decided on the winners well in advance and just pretended to run the code.

A way to approach this issue could be to set the random seed to some value so that anyone suspecting foul play can run the code themselves and get the same answer as me:

Select All Code:
1
2
set.seed(someSeed)
sample(n, 3, replace=FALSE)

I see at least two problems with it: 1) I could still have selected a seed that gives me the sample I eventually want, and 2) even using a function (e.g. of n the number of participants) as the seed doesn’t guarantee a uniform distribution for each player.

I came up with a plan which I think addresses both the uniform distribution over the players, and the incontestability of the selection.

First, I simplify the problem of selecting 3 winners among n participants to selecting 1 integer from a uniform distribution. This is easy: instead of choosing 3 items among n, I’m selecting 1 of the choose(n,3) possible combinations. Once I’ve sampled 1 number i, I simply use combn(n,3) to generate all the combinations and pick the ith item:
combn(n,3, simplify=FALSE)[[i]].

Second, I have a way to pick a number from a uniform distribution that’s completely transparent. To do this, I pick up a number uniformly at random in a much bigger set (1:N, with N>>choose(n,3)) and project this number back to the interval I’m interested in (1:choose(n,3)). That is, once I have a number j between 1 and N, I use

Select All Code:
1
i <- ceiling(choose(n,3) * j /N)

to find a random value uniformly distributed from 1 to choose(n,3)

Ideally you’d want N to be a multiple of choose(n,3) for every outcome to be exactly equally likely, but if N is much larger than choose(n,3), the slight difference in probability for each outcome is negligible (of the order of 1/N).

Now, how do I pick up a number in a bigger set in a transparent manner? We saw that using the seed is fraught with difficulty, so I need something else. I’m going to use something which neither I nor the players have any control over: the UK national lottery, which is a combination of 7 integers from the set {1,…,49}. More precisely, I’m doing this:

And this is it: this way, I pick up 3 winners among n and there is no way for me (or the players) to rig the selection. Here is the code I’ll run:

Select All Code:
1
2
3
4
5
6
lotteryResult <- c( , , , , , , ) # to be filled in by the actual lottery draw
nPlayers <- 200 # to be updated with the number of participants
nWinners <- 3
index <- ceiling(choose(nPlayers, nWinners) * lexicographicIndex(lotteryResult, 49) / choose(49,7))
winners <- combn(nPlayers,nWinners,simplify = FALSE)[[index]]
cat("\n The winners are:", winners)

The deadline for the competition is Wednesday 09th July at midnight UK time. The next lottery (UK lotto) draw after that is on Saturday 12th July, and I’ll use that draw to decide on the winners, using the code I’ve presented here.

What do you think? Can you poke holes in my strategy? Can you come up with something simpler?

Note about the lexicographic index
It is not terribly difficult to find the index of a combination without generating them all. All you need to do is to count the number of combinations that appeared before. For example, if the combination starts with 3, you know it comes after all the combinations that start with 1 and 2. Here is the code I wrote to go from a combination to its lexicographic index. There’s also a test function after it.

Select All Code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# gives the index of the combination if sorted in lexicographic order, starting at 1
# lexicographicIndex(c(2,3,4), 5) = 7
# because [2,3,4] is the 7th item in the lexicographically ordered combinations of length 3 using 5 letters: 
# 1 2 3
# 1 2 4
# 1 2 5
# 1 3 4
# 1 3 5
# 1 4 5
# 2 3 4
# 2 3 5
# 2 4 5
# 3 4 5
# C. Ladroue
# combination is a sequence of unique integers between 1 and alphabetSize
 
lexicographicIndex <- function(combination, alphabetSize){
  combination <- sort(combination)
 
  combinationLength <- length(combination)
 
  index <- 1 
  for(p in 1:combinationLength){
 
    starting  <- ifelse(p == 1, 1 , combination[p-1] + 1)
    finishing <- combination[p] - 1
 
    if(starting <= finishing)
      index <- index + sum(sapply(starting:finishing, function(j) choose(alphabetSize - j, combinationLength - p)))
  }
    index 
}
 
 
lexicographicIndexTest <- function(){
  alphabetSize <- 10
  combinationLength <- 3
  x <- combn(alphabetSize, combinationLength, simplify = FALSE)
  cat("\n test all combinations with alphabet size = ",alphabetSize,"and combination length = ",combinationLength,": ",
      all(sapply(1:length(x), function(index) lexicographicIndex(x[[index]], alphabetSize) == index ))
      )
  cat("\n")
}

To leave a comment for the author, please follow the link and comment on their blog: Christophe Ladroue » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.