The Motivation for the Poisson Distribution
[This article was first published on Econometrics by Simulation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
# The Poisson distribution has the interesting property that it # models outcomes from events that are independent and equally # likely to occur. The distribution takes only one parameter mu # which is equal to both the mean (expected number of events) # as well as the variance. # This distribution as with all distributions is somewhat # fascinating because it represents an approximation of a # real world phenomenon. # Imagine you are trying to model the mail delivery on wednesdays. # On average you recieve 9 pieces of mail. If the mail delivery # system is well modeled by a poisson distribution then # the standard deviation of mail delivery should be 3. # Meaning most days you should recieve between 3 and 15 pieces # of mail. # What underlying physical phenomenon must exist for this to be # possible? # In order to aid this discussion we will think of the poisson # distribution as a limitting distribution of the sum of # outcomes from a number of independent binary draws: DrawsApprox <- function(mu, N) sum(rbinom(N,1,mu/N)) # This idea is if we specify a number of expected outcomes mu # and give a number of draws (N>mu) then we can approximate the # single draw of a poisson by summing across outcomes. DrawsApprox(9,9) # In this case of course the sum is 9 and variance = 0 # Under this case there are 9 letters which are always # sent out every Wednesday. # More interestingly: DrawsApprox(9,18) # In this case there are 18 letters that may be sent out. # Any one of them is possible at a 50% rate. # We want to know what the mean and variance is. # Let us design a simple function to achieve this. evar <- function(fun, draw=100, outc=NULL, ...) { for(i in 1:draw) outc <- c(outc, get(fun)(...)) list(outc=outc, mean=mean(outc), var=var(outc)) } evar("DrawsApprox", draw=10000, N=18, mu=9) # I get the mean very close to 9 as we should hope # but interestingly the variance less than five. # This is less than that of the poisson which is 9. # Let's see what happens if we double the number of # potential letters going out which will halve the # probability of any particular letter. evar("DrawsApprox", draw=10000, N=36, mu=9) # Now the variance is about 6.7 evar("DrawsApprox", draw=10000, N=72, mu=9) # Now 7.7 evar("DrawsApprox", draw=10000, N=144, mu=9) # 8.6 evar("DrawsApprox", draw=10000, N=288, mu=9) # 8.65 # We can see that as the number of letters gets very large # the mean and variance of the number letters approaches # the same number 9. I will never be able to choose a # large enough number of letters so that the variance exactly # equals the mean. # However the didactic point of how the distribution is # structured and when it may be appropriate to use should be # clear. Poisson is a good fit when the likelihood of each # individual outcome is equal, yet the number of possible # outcomes is large (in principal I could recieve 100 pieces # of mail in a single day though it would be very unlikely). bigdraw <- evar("DrawsApprox", draw=10000, N=1000, mu=9) summary(bigdraw$outc)
To leave a comment for the author, please follow the link and comment on their blog: Econometrics by Simulation.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.