The Motivation for the Poisson Distribution

[This article was first published on Econometrics by Simulation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# The Poisson distribution has the interesting property that it
# models outcomes from events that are independent and equally
# likely to occur.  The distribution takes only one parameter mu
# which is equal to both the mean (expected number of events) 
# as well as the variance.
 
# This distribution as with all distributions is somewhat 
# fascinating because it represents an approximation of a 
# real world phenomenon.
 
# Imagine you are trying to model the mail delivery on wednesdays.
 
# On average you recieve 9 pieces of mail. If the mail delivery
# system is well modeled by a poisson distribution then
# the standard deviation of mail delivery should be 3.
# Meaning most days you should recieve between 3 and 15 pieces
# of mail.  
 
# What underlying physical phenomenon must exist for this to be
# possible?
 
# In order to aid this discussion we will think of the poisson
# distribution as a limitting distribution of the sum of 
# outcomes from a number of independent binary draws:
 
DrawsApprox <- function(mu, N) sum(rbinom(N,1,mu/N))
 
# This idea is if we specify a number of expected outcomes mu
# and give a number of draws (N>mu) then we can approximate the
# single draw of a poisson by summing across outcomes.
 
DrawsApprox(9,9)
# In this case of course the sum is 9 and variance = 0
# Under this case there are 9 letters which are always
# sent out every Wednesday.
 
# More interestingly:
DrawsApprox(9,18)
# In this case there are 18 letters that may be sent out.
# Any one of them is possible at a 50% rate.
 
# We want to know what the mean and variance is.
# Let us design a simple function to achieve this.
evar <- function(fun, draw=100, outc=NULL, ...) {
  for(i in 1:draw) outc <- c(outc, get(fun)(...))
  list(outc=outc, mean=mean(outc), var=var(outc))
}
 
evar("DrawsApprox", draw=10000, N=18, mu=9)
# I get the mean very close to 9 as we should hope
# but interestingly the variance less than five.
# This is less than that of the poisson which is 9.
 
# Let's see what happens if we double the number of
# potential letters going out which will halve the 
# probability of any particular letter.
evar("DrawsApprox", draw=10000, N=36, mu=9)
# Now the variance is about 6.7
 
evar("DrawsApprox", draw=10000, N=72, mu=9)
# Now 7.7
 
evar("DrawsApprox", draw=10000, N=144, mu=9)
# 8.6
 
evar("DrawsApprox", draw=10000, N=288, mu=9)
# 8.65
 
# We can see that as the number of letters gets very large
# the mean and variance of the number letters approaches
# the same number 9.  I will never be able to choose a 
# large enough number of letters so that the variance exactly
# equals the mean.
 
# However the didactic point of how the distribution is 
# structured and when it may be appropriate to use should be
# clear.  Poisson is a good fit when the likelihood of each
# individual outcome is equal, yet the number of possible
# outcomes is large (in principal I could recieve 100 pieces
# of mail in a single day though it would be very unlikely).
 
bigdraw <- evar("DrawsApprox", draw=10000, N=1000, mu=9)
summary(bigdraw$outc)
 

To leave a comment for the author, please follow the link and comment on their blog: Econometrics by Simulation.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)