Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Today, I wanted to publish a post on generating functions, based on discussions I had with Jean-Francois while having our coffee after lunch a couple of times already. The other reason is that I publish my post while my student just finished their Probability exam (and there were a few questions on generating functions).
- A short introduction (back on a specific exercise)
In the Probability exam, I included an exercise we’ve seen in class, last week. The question is the following (question 16 in the form – in French). Let
Consider some
- if
is discrete,
- if
is (absolutely) continuous,
where
(following recent discussions with Philippe Reka, I will try to post more hand-made graphs)
Ooops. It looks like we have a discontinuity in 0. So we have to be a bit carefull here :
which can also be writen, if
This is simply the idea of saying that the overall average is a barycenter of the average per subgroup. Here,
Let us consider the three different components.
and
(since it is is a real-valued constant), and here
and
Hence,
From now on, it is just high-school level computations,
if
- Monte Carlo computations
If we are lazy (and trust me, I am extremely lazy), it is possible to use Monte Carlo simulations to compute that function,
> F=function(x) ifelse(x<0,0,1-exp(-x)/3) > Finv=function(u) uniroot(function(x) F(x)-u,c(-1e-9,1e4))$root
or (to avoid the problem of the discontinuity)
> Finv=function(u) ifelse(3*u>1,0,uniroot(function(x) + F(x)-u,c(-1e-9,1e4))$root))
Here, the inverse is simple to get, so we can faster the code using
> Finv=function(u) ifelse(3*u>1,0,-log(3*u))
Then, we use
> rF=function(n) Vectorize(Finv)(runif(n)) > M=function(t,n=10000) mean(exp(t*rF(n))) > Mtheo=function(t) (3-2*t)/(3-3*t) > u=seq(-2,1 ,by=.1) > v=Vectorize(M)(u) > plot(u,v,type="b",col='blue') > lines(u,Mtheo(u),col="red")
The problem with Monte Carlo simulations is that they should be used only if they are valid. If mean, I can compute
> set.seed(1) > M(3) [1] 5748134
Finite sum can always be computed, numerically. Even if here,
> set.seed(1) > mean(rcauchy(1000000)) [1] 0.006069028
This is related to questions I tried to ask a few years ago in a paper, where I wanted to test if
- Generating functions
Now, to get back on the begining of last course, generating functions are interesting for a lot of reasons. But first of all, let us define those function properly.
The moment generating function
This function is said to be moment generating, since if
for some
if
Now, if we use Taylor’s expansion,
and
If we look at the value of the derivative of that function at point 0, then
As we’ve seen last week in class, it is possible to define a moment generating function in higher dimension, for some random vector
for some
- From moment generating functions to characteristic functions
The problem with the moment generating function is that the function is defined (only) on some neighborhood of
Thus, an interesting idea is to consider
Thus, let
so the characteristic function always exists. Paul Lévy proved in 1925 that the characteristic function completely characterizes the distribution.
Now, if we look at it quickly, it looks like we did not change a lot of things here, and we should be able to write
If we want to do things properly, let us look at Gut (2005) for instance. Assume that
which is well defined on some strip
But it has to be defined on some neighbourhood of
- Fast Fourier Transform
Recall Euler’s formula,
Thus, we should not be surprised to see Fourier’s transform. From this formula, we can write
Using some results in Fourier analysis, we can prove that probability function satisfies (if the random variable has a Dirac mass in x)
which can also be written,
And a similar relationship can be obtained if the distribution is absolutely continuous at point
Actually, since we work with real-valued random variables, the complex area was just a detour, and we can prove that actually,
It is then possible to get the cumulative distribution function using Gil-Peleaz’s inversion formula, obtained in 1951,
Nice isn’t it. It means, anyone working on financial markets know those formulas, used to price options (see Carr & Madan (1999) for instance). And the good thing is that any mathematical or statistical software can be used to compute those formulas.
- Characteristic function and actuarial science
Now, what is the interest of all that in actuarial science ? Characteristic functions are interesting when we deal with sums of independent random variables, since the characteristic function of the sum is simple the product of the characteristic functions. They are also interesting when dealing with compound sums1. Consider the problem of computing the 99.5% quantile of the compound sum of Gamma random variable, i.e.
where
> n <- 2^20; > p <- diff(pgamma(0:n-.5,alpha,beta))
Then, the code to compute
> f <- Re(fft(exp(lambda*(fft(p)-1)),inverse=TRUE))/n
To compute the 99.5% quantile, we just use
> sum(cumsum(f)<.995)
That’s extremely simple, isn’it. Want me to do it for real ? Consider the following losses amounts
> set.seed(1) > X <- rexp(200,rate=1/100) > print(X[1:5]) [1] 75.51818 118.16428 14.57067 13.97953 43.60686
Let us fit a gamma distribution. We can use
> fitdistr(X,"gamma") shape rate 1.309020256 0.013090411 (0.117430137) (0.001419982)
or
> f <- function(x) log(x)-digamma(x)-log(mean(X))+mean(log(X)) > alpha <- uniroot(f,c(1e-8,1e8))$root > beta <- alpha/mean(X) > alpha [1] 1.308995 > beta [1] 0.01309016
Whatever, we have the parameters of our Gamma distribution for individual losses. And assume that the mean of the Poisson counting variable is
> lambda <- 100
Again, it is possible to use monte carlo simulations, if we can easily generate a compound sum. We can use the following generic code: first we need functions to generate the two kinds of variables of interest,
> rN.P <- function(n) rpois(n,lambda) > rX.G <- function(n) rgamma(n,alpha,beta)
then, we can use (see here for a discussion on possible codes)
> rcpd4 <- function(n,rN=rN.P,rX=rX.G){ + return(sapply(rN(n), function(x) sum(rX(x))))}
If we generate one million variables, we can get an estimator for the quantile,
> set.seed(1) > quantile(rcpd4(1e6),.995) 99.5% 13651.64
Another idea is to remember a proporty of the Gamma distribution: a sum of independent Gamma distributions is still Gamma (with additional assumptions on the parameters, but here we consider identical Gamma distributions). Thus, it is possible to compute the cumulative distribution function of the compound sum,
> F <- function(x,lambda=100,nmax=1000) {n <- 0:nmax + sum(pgamma(x,n*alpha,beta)*dpois(n,lambda))}
(or at least a approximation). If we invert that function, we get our quantile
> uniroot(function(x) F(x)-.995,c(1e-8,1e8))$root [1] 13654.43
Which is consistent with our monte carlo computation. Now, we can also use fast Fourier transform here,
> n <- 2^20; lambda <- 100 > p <- diff(pgamma(0:n-.5,alpha,beta)) > f <- Re(fft(exp(lambda*(fft(p)-1)),inverse=TRUE))/n > sum(cumsum(f)<.995) [1] 13654
Now, if it is simple, is it efficient ? Let us compare for instance computation time to get those three outputs,
> system.time(quantile(rcpd4(1e5),.995)) user system elapsed 2.453 0.106 2.611 > system.time(uniroot(function(x) F(x)-.995,c(1e-8,1e8))$root) user system elapsed 0.041 0.012 0.361 > system.time(sum(cumsum(Re(fft(exp(lambda*(fft(p)-1)),inverse=TRUE))/n)<.995)) user system elapsed 0.527 0.020 0.560
Computations here are comparable with the (numerical) inversion of the cumulative distribution function. Except that here, we were lucky: if the distribution is not Gamma but log normal, the second algorithm cannot be used.
1. This numerical example is taken from the first chapter of Computational Actuarial Science with R, to appear in a few months.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.