Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In insurance, the law of large numbers (named loi des grands nombres initially by Siméon Poisson, see e.g. http://en.wikipedia.org/…) is usually mentioned to legitimate large portfolios, because of pooling and diversification: the larger the pool, the more ‘predictable’ the losses will be (in a given period). Of course, under standard statistical assumption, namely finite expected value, and independence (see http://freakonometrics.blog.free.fr/…. for a discussion, in French). Since in insurance, catastrophes are usually rare – and extremely costly – and actuaries might be interested to model occurrence of that small number of events (see e.g. Aldous’ book on that specific topic, that can be downloaded from http://stat.berkeley.edu/…). The theorem behind is sometimes called the law of small numbers (from the book published by Ladislaus Bortkiewicz, but we’ll get back to that story later on, see also Whitaker (1914) http://biomet.oxfordjournals.org/… or the book recently published by Michael Falk, Jürg Hüsler and Rolf-Dieter Reiss).
- The Poisson distribution
The so-called Poisson distribution (see http://en.wikipedia.org/…) was introduced by Siméon Poisson in 1837 (in Recherches sur la Probabilité des Jugements en Matière Criminelle et en Matière Civile, Précédées des Règles Générales du Calcul des Probabilités, see http://gallica.bnf.fr/…). But it had been defined more than a century before, by Abraham De Moivre, in 17111, in De Mensura Sortis seu; de Probabilitate Eventuum in Ludis a Casu Fortuito Pendentibus (see e.g. the review in http://www.jstor.org/…). Let
De Moivre obtained that distribution from an approximation of the binomial distribution. Recall that the binomial distribution is a standard distribution in actuarial science, for instance to model the number of deaths among
And if
- The law of small numbers
The heuristic of the main theorem, related to the Poisson distribution is the following: let
The heuristic is that if we consider a large number of observations, and if we count how many are in a given (small) region, then the number of such observations is Poisson distributed.
n=1000 X=runif(n)*10-1.5 Y=runif(n)*10-1.5 plot(X,Y,axis=FALSE,cex=.6) u=seq(-1,1,by=.01) v=sqrt(1-u^2) polygon(c(u,rev(u)),c(v,rev(-v)),col="yellow",border=NA) I=(X^2+Y^2)<1 points(X[I],Y[I],cex=.6,pch=19,col="red")
If we run some simulations,
> n=1000 > ns=100000 > N=rep(NA,ns) > for(s in 1:ns){ + X=runif(n)*10-1.5 + Y=runif(n)*10-1.5 + I=(X^2+Y^2)<1 + N[s]=sum(I) + } > hist(N,breaks=0:60,probability=TRUE,col="yellow") > mean(N) [1] 31.41257
The parameter of the Poisson distribution is the area of the yellow disk, over the area of the square, i.e.
> (lambda=10*pi) [1] 31.41593 > lines(0:60-.5,dpois(0:60,lambda),type="b",col="red")
To get an interpretation related to insurance modeling, let
- The Poisson process
As mentioned above, the Poisson distribution appears when events occur somehow randomly and independently, over time. It is then natural to study the time between two occurences (or two claims, in an insurance context).
- Poisson distribution, and claims occurrence
It is neither Siméon Poisson nor De Moivre, but Ladislaus Von Bortkiewicz who first mentioned the Poisson distribution as the law of small numbers. In 1898 (see http://archive.org/…), he studied the number number of soldiers killed by being kicked by a horse, from 1875 till 1894, in 200 corps (more precisely 10 corps over 20 ans).
He did obtain the following distribution (here, the parameter of the Poisson distribution is 0.61, i.e. the average number of death per year)
number of death per year |
Empirical counts |
Poisson distribution |
0 | 109 | 108.67 |
1 | 65 | 66.21 |
2 | 22 | 20.22 |
3 | 3 | 4.11 |
4 | 1 | 0.63 |
5 and more | 0 | 0.08 |
It is possible to find a lot of cases where the Poisson distribution fits extremely well. For instance, if we consider the number of hurricanes, that landed in Florida after 1850,
number of hurricanes per year |
empirical frequency |
Poisson frequency |
0 | 30 | 27.16 |
1 | 48 | 47.99 |
2 | 37 | 42.41 |
3 | 29 | 24.98 |
4 | 8 | 11.03 |
5 | 3 | 3.90 |
6 | 3 | 1.15 |
7 | 1 | 0.29 |
8 and more | 0 | 0.08 |
- Poisson distribution, and return period
The return period was introduced by Emil Gumbel, in hydrology, to link probabilities and durations (see e.g. http://freakonometrics.blog.free.fr/…). A decennial event has an occurence probability of 1/10. 10 is then the average waiting time before occurence. This does not mean that the event will not occur before 10 years, or has to occur before 10 years. Consider a return period
And the probability of non-occurence over
return period |
||||||
Number of years ( |
10 | 20 | 50 | 100 | 200 | |
10 | 65.1% | 40.1% | 18.3% | 9.6% | 4.9% | |
20 | 87.8% | 64.2% | 33.2% | 18.2% | 9.5% | |
50 | 99.5% | 92.3% | 63.6% | 39.5% | 22.5% | |
100 | 99.9% | 99.4% | 86.7% | 63.4% | 39.5% | |
200 | 99.9% | 99.9% | 98.2% | 86.6% | 63.3% |
The diagonal in the table above is extremely interesting. It looks like there is some kind of convergence towards a limiting value (here 63.2%). Indeed, the number of events observed over n years have a Binomial distribution, with probability
- Rare probabilities and the Poisson distribution
The Poisson distribution keeps appearing when computing probabilies of rare events. For instance, the probability to have at least one incident in a nuclear plant in France, over a 50 year period. Assume that the annual probability of an incident in a reactor
Of course, a linear approximation is not correct (even if it was mentioned in some French newspaper, as explained in an old post http://freakonometrics.blog.free.fr/…)
On the other hand
> p=0.00005 > 1-(1-p)^(50*80) [1] 0.1812733 > 1-exp(-50*80*p) [1] 0.1812692
which is the probability that
Another way of looking at this problem is based on the following idea: given the fact that in 45 years of observations on 450 reactors worldwide (roughly), three major accidents were observed including Three Mile Island (1979) and Fukushima (2011), i.e. the average time between accidents can be estimated at 16 years. For a single reactor, we can assume that the average time to wait before an incident is 450 times 16 years, i.e 7200 years. Or the probability to have one incident, over one year, for one reactor is 1 over 7200 (this is the idea behind the return period concept). If we assume that the arrival of accidents occurs randomly and independently of each other (as defined above) then the number of major accidents observed over a period of 50 years in France follows a Poisson distribution with parameter 50 / (7200/80). Also, the probability of having no major accident over 50 years, with 80 reactors can be estimated by
i.e.
> 1-exp(-50*80/7200) [1] 0.4262466
(keeping in mind all the uncertainty around the estimated waiting time before a major accident to a single reactor!).
Arthur Charpentier
Arthur Charpentier, professor at UQaM in Actuarial Science. Former professor-assistant at ENSAE Paristech, associate professor at Ecole Polytechnique and assistant professor in Economics at Université de Rennes 1. Graduated from ENSAE, Master in Mathematical Economics (Paris Dauphine), PhD in Mathematics (KU Leuven), and Fellow of the French Institute of Actuaries.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.