How to Remember the Poisson Distribution
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
> ppois(0:15,4) [1] 0.01831564 0.09157819 0.23810331 0.43347012 0.62883694 0.78513039 0.88932602 0.94886638 [9] 0.97863657 0.99186776 0.99716023 0.99908477 0.99972628 0.99992367 0.99998007 0.99999511As the number of events increases from 0 to 15 the CDF approaches 1. See Figure.
The probability of exactly $n$ events occurring is given by probability density function (PDF) or probability mass function, more accurately, since it’s a discrete distribution: \begin{equation} \Pr(X = n) = \dfrac{α^k}{k!} \; e^{-α}; \quad n = 0, 1, 2, \ldots \label{eqn:ppdf} \end{equation} which is just \eqref{eqn:pcdf} without the summation because only single event is considered. In R, the probability density is calculated using the function dpois(). Using α = 4 again, we get
> dpois(0:15,4) [1] 1.831564e-02 7.326256e-02 1.465251e-01 1.953668e-01 1.953668e-01 1.562935e-01 1.041956e-01 [8] 5.954036e-02 2.977018e-02 1.323119e-02 5.292477e-03 1.924537e-03 6.415123e-04 1.973884e-04 [15] 5.639669e-05 1.503912e-05
The Poisson distribution is used to model such things as the number of clicks detected by Geiger counter (audio). It is also the most commonly assumed source of arrivals in queueing theory and computer performance analysis. In fact, it was Agner Erlang who first presented the Poisson distribution as a model of incoming telephone calls with $\alpha = \lambda t$ in 1907 for the purpose of sizing trunkline capacity at Danish Telekom. However, for those not engaged in applying probability theory on a regular basis, the expression in \eqref{eqn:pcdf} looks formidable and hard to remember.
The trick I employ in my classes is to remember a much simpler, but wrong, version of \eqref{eqn:pcdf} and then correct it. The corrections can be regarded as a little story that is easy to remember: you’re more likely to remember a story than a formula like \eqref{eqn:pcdf}. Here’s the story.
- Start with this simple (but incorrect) expression for the CDF \begin{equation} F(α,n) \sim e^{+α} \; \times \; e^{-α} \label{eqn:1cdf} \end{equation} Clearly, \eqref{eqn:1cdf} cannot have a value bigger than 1, which is what is required of a probability. The problem, however, is that since α is a constant this equation will always be equal to 1, which is not quite what we want. For example, if α = 0: \begin{equation} e^{0} \; \times \; e^{0} = 1 \times 1 = 1 \end{equation} In general, for any positive α: \begin{equation} e^{+α} \; \times \; e^{-α} = \dfrac{e^{α}}{e^{α}} = 1 \end{equation} Clearly, this stuck version is wrong. The question is: How can we correct it?
- The factor $e^{-α}$ in \eqref{eqn:1cdf} is a decaying exponential that will approach zero for any large value of α. The problem lies with $e^{+α}$ since it will become enormous for an arbitrarily large value of α. So, we need to tame it.
- Recall that the exponential function can be written as an infinite power series: \begin{equation} e^{x} = 1 + x + \dfrac{x^2}{2!} + \dfrac{x^3}{3!} + \ldots \label{eqn:infexp} \end{equation}
- But, if we truncate the series \eqref{eqn:infexp} at $n$ terms \begin{equation} 1 + x + \dfrac{x^2}{2!} + \ldots + \dfrac{x^n}{n!} \label{eqn:truncexp} \end{equation} it is no longer equivalent to $e^{x}$, but something less. The shorthand notation for \eqref{eqn:truncexp} is \begin{equation} \sum_{k=0}^n \dfrac{x^k}{k!} \end{equation} In our case, $x$ takes a specific value α.
- The factor $e^{+α}$ in \eqref{eqn:1cdf} is now replaced with the tamed sum: \begin{equation} e^{+α} ~\rightarrow~ \sum_{k=0}^n \dfrac{α^k}{k!} \label{eqn:sumexp} \end{equation}
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.