Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A historian, a data scientist, a programmer, a mathematician, and a philosopher discuss the question, how likely it is that a lottery draw (6
out of 49
) contains two consecutive numbers.
The historian
The historian argues that from 1955 up to 2011, there were 5026 lottery draws in Germany, every Saturday, and from 2000 on, two draws every Wednesday. In 2557
cases, there were consecutive numbers among the drawn numbers. Hence, the likelihood is around 2557/5026=51%
.
By the way, he adds, did you know how Casanova got rich while he introduced the lottery in Paris?
The junior data scientist
The data scientist argues that a lottery draw is a repeatable experiment that can be easily simulated. He fires up his R console and creates first a function that checks if a draw contains consecutive numbers:
> has.consecutives <- function(draw) any(diff(sort(draw))==1)
The function first sorts the numbers drawn, and then calculates the differences between those numbers. If there are consecutive numbers, then there must be a 1
in these differences.
Now one lottery draw can be simulated with sample.int(49,6)
. To get an estimate for the probability of the occurrence of consecutive numbers, he repeats these lottery draws very often:
> N <- 100000 > counts <- replicate(N, has.consecutives(sample.int(49,6))) > sum(counts)/N [1] 0.49498
The simulation suggests the probability is slightly smaller than 0.5
. To rule out this result was produced by chance, he calculates a confidence interval for the wanted proportion. There is – and that is charming about the statistical computing environment R – a package for that, which implements several methods for this task:
> library(binom) > with( binom.confint( sum(counts), N, method="wilson"), c(lower, upper) ) [1] 0.4918814 0.4980790
Now the theory states that if the simulation is repeated very often, in 95%
of the cases the true parameter lies in the calculated intervals. Hence the data scientist is quite certain about the first two digits of the probability.
The programmer
The programmer takes a recursive approach. If we find a function f(n,m)
that returns the number of ways to choose m out of n numbers without consecutive numbers, then we can calculate the wanted probability by
> 1 - f(49,6)/choose(49,6)
Some corner cases for f
are obvious: if m
is 1
, there are n
ways. For n=3
and m=2
, there is exactly one way. By the pigeonhole principle, when m>ceiling(n/2)
, there is no way.
In the general f(n,m)
case, we can distinguish between the ways where neither the first nor the last number is chosen (f(n-2,m)
) and the ways where either the first and/or the last element is chosen. In the latter case, by the inclusion/exclusion principle, the number of ways is 2*f(n-2, m-1) - f(n-4, m-2)
. In case m=2
, the latter term becomes 1
; this is another hard-coded return value f(n,0)=1
. Altogether, the function is defined as
> f <- function(n,m) { if(m==0) return(1) if(m==1) return(n) if(n==3 && m==2) return(1) # pigeonhole principle if(m>ceiling(n/2)) return(0) # inclusion/exclusion principle return( f(n-2,m) + 2*f(n-2, m-1) - f(n-4, m-2) ) } > 1 - f(49,6)/choose(49,6) [1] 0.4951984
The mathematician
After some thinking, the mathematician comes up with equivalent problem formulation. Instead of counting the number the draws without consecutive numbers, what if we count the number of ways to draw the blanks immediately after the drawn numbers?
Given a draw N1, ..., N6
without consecutive numbers, the numbers N1+1, ..., N6+1
after each drawn number are not drawn. We have to allow the number 50
to make this work, though.
Hence, the number of draws 6
out of 49
without consecutives corresponds to number of 6
blanks drawn from (50-6)
numbers. So the solution boils down to
> 1 - choose(50-6, 6)/choose(49,6) [1] 0.4951984
The existentialist philosopher
Finally, the philosopher points out that it is not the result, but the solution approaches that are important. The objective truth about consecutive numbers in lottery draws is a fact. Now that it is known, it might be tempting to disregard the ways used to find it. But there is value in subjective reflection, in appreciating the individual thoughts and approaches, as this reflection deepens the understanding of reality.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.