Generating functions

Posted on November 8, 2013 by arthur charpentier in R bloggers | 0 Comments

[This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Today, I wanted to publish a post on generating functions, based on discussions I had with Jean-Francois while having our coffee after lunch a couple of times already. The other reason is that I publish my post while my student just finished their Probability exam (and there were a few questions on generating functions).

A short introduction (back on a specific exercise)

In the Probability exam, I included an exercise we’ve seen in class, last week. The question is the following (question 16 in the form – in French). Let $F(x)=1-e^{-x}/3$ for $x\geq 0$ and $F(x)=0$ for $x<0$ be the cumulative distribution function of some random variable $X$ , i.e. $F(x)=\mathbb{P}(X\leq x)$ . What is the moment generating function of $X$ , i.e. $M(t)=\mathbb{E}(e^{tX})$ ?

Consider some $t\in\mathbb{R}$ (we’ll see later on if some additional constraint are necessary). The tricky part of this exercice appears extremely fast, actually: how could you write $\mathbb{E}(g(X))$ ? I mean, in any probability textbook, the standard answer is

if $X$ is discrete,

$\mathbb{E}(g(X))=\sum_x g(x)\cdot \mathbb{P}(X=x)$

if $X$ is (absolutely) continuous,

$\mathbb{E}(g(X))=\int g(x)\cdot f(x)dx$

where $f(\cdot)$ is the density of $X$ . Here, $X$ is clearly not a discrete variable. But is it (absolutely) continuous. My (strong) belief is that you need to plot that distribution function to see how it looks like, $x\mapsto F(X)$ , for all $x\in\mathbb{R}$

(following recent discussions with Philippe Reka, I will try to post more hand-made graphs)

Ooops. It looks like we have a discontinuity in 0. So we have to be a bit carefull here : $X$ is neither continuous nor discrete. Let us use the double projection formula,

$\mathbb{E}(Y)=\mathbb{E}(\mathbb{E}(Y\vert Z))$

which can also be writen, if $Z\in\{A,B\}$ ,

$\mathbb{E}(Y)=\mathbb{P}(Z=A)\cdot \mathbb{E}(Y\vert Z=A)+\mathbb{P}(Z=B)\cdot \mathbb{E}(Y\vert Z=B)$

This is simply the idea of saying that the overall average is a barycenter of the average per subgroup. Here, $Y=g(X)$ and let $A=\{X=0\}$ while $B=\{X>0\}$ (note that $\mathbb{P}(X\in \{A\cup B\})=1$ ). Thus,

$\mathbb{E}(g(X))=\mathbb{P}(X=0)\cdot \mathbb{E}(g(X)\vert X=0)+\mathbb{P}(X>0)\cdot \mathbb{E}(g(X)\vert X>0)$

Let us consider the three different components.

$\mathbb{P}(X=0)=F(0)=1-1/3=2/3$

$\mathbb{P}(X>0)=1-\mathbb{P}(X=0)=1/3$

and

$\mathbb{E}(g(X)\vert X=0)=g(0)$

(since it is is a real-valued constant), and here $g(0)=e^{0}=1$ . So finally, we should compute $\mathbb{E}(g(X)\vert X>0)$ . Observe that $X$ given $X>0$ is a (absolutely) continuous random variable, with a density. To get it, observe that for all $x>0$ ,

$\overline{F}_\star(x)=\mathbb{P}(X>x\vert X>0)=\frac{\mathbb{P}(X>x)}{\mathbb{P}( X>0)}=\frac{e^{-x}/3}{e^{-0}/3}=e^{-x}$

and $f_\star(x)=e^{-x}$ , i.e. $X$ given $X>0$ is an exponential distribution.

Hence, $X$ is a mixture between an exponential variable and a Dirac mass in $0$ . This was actually the tricky part of the question since it is not obvious when we see (only) the formula above.

From now on, it is just high-school level computations,

$\mathbb{E}(g(X)\vert X>0)=\int_0^\infty g(x) f_\star (x)dx=\int_0^\infty e^{(t-1)\ x} dx=\frac{1}{1-t}$

if $t\leq 1$ (for the first time, we see that the function is not defined everywhere). If we put all the expressions together,

$M(t)=\frac{2}{3}\cdot 1 + \frac{1}{3}\cdot \frac{1}{1-t}=\frac{3-2t}{3-3t}$

Monte Carlo computations

If we are lazy (and trust me, I am extremely lazy), it is possible to use Monte Carlo simulations to compute that function,

> F=function(x) ifelse(x<0,0,1-exp(-x)/3)
> Finv=function(u) uniroot(function(x) F(x)-u,c(-1e-9,1e4))$root

or (to avoid the problem of the discontinuity)

> Finv=function(u) ifelse(3*u>1,0,uniroot(function(x)
+ F(x)-u,c(-1e-9,1e4))$root))

Here, the inverse is simple to get, so we can faster the code using

> Finv=function(u) ifelse(3*u>1,0,-log(3*u))

Then, we use

> rF=function(n) Vectorize(Finv)(runif(n))
> M=function(t,n=10000) mean(exp(t*rF(n)))
> Mtheo=function(t) (3-2*t)/(3-3*t)
> u=seq(-2,1 ,by=.1)
> v=Vectorize(M)(u)
> plot(u,v,type="b",col='blue')
> lines(u,Mtheo(u),col="red")

The problem with Monte Carlo simulations is that they should be used only if they are valid. If mean, I can compute

> set.seed(1)
> M(3)
[1] 5748134

Finite sum can always be computed, numerically. Even if here, $http://latex.codecogs.com/gif.latex?\mathbb{E}(e^{3X})$ does not exist (or to be more precise, is not finite). It is like the average of a Cauhy sample… I can always compute it, even if the expected value does not exists…

> set.seed(1)
> mean(rcauchy(1000000))
[1] 0.006069028

This is related to questions I tried to ask a few years ago in a paper, where I wanted to test if $X\in L_1$ (or not). Almost all the tests I know are actually based on that assumption… But this is not the point here. My point is that those generating functions are interesting, when then exist. And perhaps working with characteristic function is a better idea.

Generating functions

Now, to get back on the begining of last course, generating functions are interesting for a lot of reasons. But first of all, let us define those function properly.

The moment generating function $M_X(t)=\mathbb{E}(e^{tX})$ exists if it is finite on a neighbourhood of $http://latex.codecogs.com/gif.latex?0$ (there is an $http://latex.codecogs.com/gif.latex?a%3E0$ such that for all $http://latex.codecogs.com/gif.latex?t\in[-a,+a]$ , $http://latex.codecogs.com/gif.latex?M_X(t)%3C\infty$ ). In that case, there exists some (open) interval $http://latex.codecogs.com/gif.latex?(a,b)\in\overline{R}$ such that for all $http://latex.codecogs.com/gif.latex?t\in(a,b)$ , $http://latex.codecogs.com/gif.latex?M_X(t)%3C\infty$ , called the convergence strip of the moment generating function.

This function is said to be moment generating, since if $http://latex.codecogs.com/gif.latex?M_X(\cdot)$ exists (as defined in the previous paragraph), then all moments exist, for all $http://latex.codecogs.com/gif.latex?k\in%20\mathbb{N}\backslash\{0\}$ , $http://latex.codecogs.com/gif.latex?\mathbb{E}\left(\vert%20X\vert^k\right)%3C\infty$ . This is basically due to the fact that, for all $http://latex.codecogs.com/gif.latex?k\in%20\mathbb{N}\backslash\{0\}$ , $http://latex.codecogs.com/gif.latex?x^k\exp(-\vert%20t\vert%20x)\rightarrow%200$ as $http://latex.codecogs.com/gif.latex?x\rightarrow\infty$ , so, for all $http://latex.codecogs.com/gif.latex?x$ large enough, $http://latex.codecogs.com/gif.latex?x^k%20\leq%20\exp(\vert%20t\vert%20x)$ . And before, it is always possible to use a multiplicative constant,

$\left\{\begin{array}{l}\text{if }x>x_0, \ \vert x\vert^k \leq \exp(\vert t x\vert)\\\text{if }x\leq x_0, \ \vert x\vert^k \leq K \cdot \exp(\vert t x\vert)\end{array}\right.$

for some $http://latex.codecogs.com/gif.latex?K$ . Thus,

$\mathbb{E}\left\vert X\vert^k\right) \leq \mathbb{E}\left( e^{\vert tX \vert}\right)\leq \mathbb{E}\left( e^{- tX}\right)+\mathbb{E}\left( e^{ tX}\right)\infty$

if $http://latex.codecogs.com/gif.latex?t$ is small enough (namely $http://latex.codecogs.com/gif.latex?[-t,+t]$ belongs to the convergence strip).

Now, if we use Taylor’s expansion,

$M_X(t)=\mathbb{E}\left( e^{ tX}\right)=\mathbb{E}\left(\sum_{k=0}^\infty \frac{(tX)^k}{k!}\right)=\sum_{k=0}^\infty\frac{t^k}{k!}\mathbb{E}[X^k]$

and

$\frac{\partial^k M_X(t)}{\partial t^k} =\mathbb{E}\left( X^k e^{tX} \right)$

If we look at the value of the derivative of that function at point 0, then

$\left. \frac{\partial^k M_X(t)}{\partial t^k}\right\vert_{0} =\mathbb{E}\left( X^k \right)$

As we’ve seen last week in class, it is possible to define a moment generating function in higher dimension, for some random vector $http://latex.codecogs.com/gif.latex?\boldsymbol{X}=(X_1,\cdots,X_d)$ ,

$M_{\boldsymbol{X}}(\boldsymbol{t})=\mathbb{E}\left(e^{\boldsymbol{t}^{{\sffamily T}}\boldsymbol{X}}\right)=\mathbb{E}\left(e^{t_1X_1+\cdots+t_dX_d}\right)$

for some $http://latex.codecogs.com/gif.latex?\boldsymbol{t}\in\mathbb{R}^d$ . It is again a moment generating function since crossed derivatives (taken a point $http://latex.codecogs.com/gif.latex?\boldsymbol{0}$ ) are cross-moments. For instance,

$\left. \frac{\partial^2 M_X(\boldsymbol t)}{\partial t_i \partial t_j}\right\vert_{\boldsymbol{0}} =\mathbb{E}\left( X_iX_j \right)$ Some, moment generating functions are interesting if you want to derive moments of a given distribution. Another interesting feature is that this moment generating function (under certain conditions) fully characterize the distribution of the random variable, in the sense that if for some $http://latex.codecogs.com/gif.latex?%20h%3E0$ ,
$http://latex.codecogs.com/gif.latex?%20M_X(t)=M_Y(t)$ for all $http://latex.codecogs.com/gif.latex?%20t\in(-h,+h)$ , then $http://latex.codecogs.com/gif.latex?X\overset{\mathcal{L}}{=}Y$ .

From moment generating functions to characteristic functions

The problem with the moment generating function is that the function is defined (only) on some neighborhood of $http://latex.codecogs.com/gif.latex?%200$ , and we should be careful. The other problem is that it does exist only for distribution in $http://latex.codecogs.com/gif.latex?%20L_\infty$ . Which might be a strong assumption.

Thus, an interesting idea is to consider $http://latex.codecogs.com/gif.latex?%20\mathbb{E}\left(%20e^{tX}%20\right)$ not on the real line, but on the imaginary line.

Thus, let $http://latex.codecogs.com/gif.latex?%20\phi_X(t)=\mathbb{E}\left(%20e^{i%20tX}%20\right)$ for some $http://latex.codecogs.com/gif.latex?%20t\in\mathbb{R}$ . Actually, not some, but all $http://latex.codecogs.com/gif.latex?%20t\in\mathbb{R}$ , since

$\vert\phi_X(t) \vert=\vert\int e^{i tX} dF_X(x) \vert\leq \int \vert e^{i tX} \vert dF_X(x) =\int dF_X(x)=1$

so the characteristic function always exists. Paul Lévy proved in 1925 that the characteristic function completely characterizes the distribution.

Now, if we look at it quickly, it looks like we did not change a lot of things here, and we should be able to write

$http://latex.codecogs.com/gif.latex?%20\phi_X(t)=M_X(i%20t)$

If we want to do things properly, let us look at Gut (2005) for instance. Assume that $http://latex.codecogs.com/gif.latex?%20M_X(\cdot)$ is defined on some interval $http://latex.codecogs.com/gif.latex?%20(-a,+a)$ . It is then possible to define a function $\Gamma_X:\mathbb{C}\mapsto \mathbb{C}$ (this time, it is no longer a real-valued function) as

$\Gamma_X(z)=\mathbb{E}(e^{z X})$

which is well defined on some strip $\{z\in\mathbb{C}, \vert\text{Re}(z)\vert\< <h\}$ .
$http://latex.codecogs.com/gif.latex?%20\phi_X(\cdot)$ and $http://latex.codecogs.com/gif.latex?%20M_X(\cdot)$ are then restriction of that function respectively on the imaginary line, and the real line. That function $http://latex.codecogs.com/gif.latex?%20\Gamma_X(\cdot)$ is clearly holomorphic, and thus, the value it takes on such a strip is fully determined by the values it takes on the real interval $http://latex.codecogs.com/gif.latex?%20(-a,+a)$ . Thus, the moment generating function will completely characterize the distribution.

But it has to be defined on some neighbourhood of $http://latex.codecogs.com/gif.latex?%200$ . Which is not trivial actually… I mean, I nonlife insurance, we see a lot a Pareto distributions.

Fast Fourier Transform

Recall Euler’s formula,

$e^{it}=\cos(t)+i \ \sin(t)$

Thus, we should not be surprised to see Fourier’s transform. From this formula, we can write

$\phi_X(t)=\mathbb{E}\left( e^{i tX} \right) =\mathbb{E}\left( \cos[tX] \right)+i\ \mathbb{E}\left( \sin[tX] \right)$

Using some results in Fourier analysis, we can prove that probability function satisfies (if the random variable has a Dirac mass in x)

$\mathbb{P}(X=x)=\lim_{T\rightarrow \infty} \frac{1}{2T}\int_{-T}^{+T}e^{-itx}\phi_X(t)dt$

which can also be written,

$f_X(x)=\frac{1}{2\pi}\int_{-\pi}^{+\pi}e^{-itx}\phi_X(t)dt$

And a similar relationship can be obtained if the distribution is absolutely continuous at point $x$ ,

$f_X(x)=\frac{1}{2\pi}\int_{-\infty}^{+\infty}e^{-itx}\phi_X(t)dt$

Actually, since we work with real-valued random variables, the complex area was just a detour, and we can prove that actually,

$f_X(x)=\frac{1}{2\pi}\int_{-\pi}^{+\pi} \text{Re}\left(e^{-itx}\phi_X(t)\right)dt$

It is then possible to get the cumulative distribution function using Gil-Peleaz’s inversion formula, obtained in 1951,

$F_X(x)=\frac{1}{2} +\frac{1}{2\pi}\int_{0}^{\infty} \frac{e^{itx}\phi_X(-t)+e^{-itx}\phi_X(t)}{it}dt$

Nice isn’t it. It means, anyone working on financial markets know those formulas, used to price options (see Carr & Madan (1999) for instance). And the good thing is that any mathematical or statistical software can be used to compute those formulas.

Characteristic function and actuarial science

Now, what is the interest of all that in actuarial science ? Characteristic functions are interesting when we deal with sums of independent random variables, since the characteristic function of the sum is simple the product of the characteristic functions. They are also interesting when dealing with compound sums¹. Consider the problem of computing the 99.5% quantile of the compound sum of Gamma random variable, i.e.

$http://latex.codecogs.com/gif.latex?%20S=\sum_{n=1}^N%20X_i$

where $http://latex.codecogs.com/gif.latex?%20X_i\sim\mathcal{G}(\alpha,\beta)$ are i.i.d. and $http://latex.codecogs.com/gif.latex?%20N\sim\mathcal{P}(\lambda)$ . The strategy is to discretize the loss amounts,

> n <- 2^20; 
> p <- diff(pgamma(0:n-.5,alpha,beta))

Then, the code to compute $http://latex.codecogs.com/gif.latex?%20\tilde%20f(s)=\mathbb{P}(S\in[s\pm1/2])$ , we use

> f <- Re(fft(exp(lambda*(fft(p)-1)),inverse=TRUE))/n

To compute the 99.5% quantile, we just use

> sum(cumsum(f)<.995)

That’s extremely simple, isn’it. Want me to do it for real ? Consider the following losses amounts

> set.seed(1)
> X <- rexp(200,rate=1/100)
> print(X[1:5])
[1] 75.51818 118.16428 14.57067 13.97953 43.60686

Let us fit a gamma distribution. We can use

> fitdistr(X,"gamma")
      shape         rate    
  1.309020256   0.013090411 
 (0.117430137) (0.001419982)

> f <- function(x) log(x)-digamma(x)-log(mean(X))+mean(log(X))
> alpha <- uniroot(f,c(1e-8,1e8))$root
> beta <- alpha/mean(X)
> alpha
[1] 1.308995
> beta
[1] 0.01309016

Whatever, we have the parameters of our Gamma distribution for individual losses. And assume that the mean of the Poisson counting variable is

> lambda <- 100

Again, it is possible to use monte carlo simulations, if we can easily generate a compound sum. We can use the following generic code: first we need functions to generate the two kinds of variables of interest,

> rN.P <- function(n) rpois(n,lambda)
> rX.G <- function(n) rgamma(n,alpha,beta)

then, we can use (see here for a discussion on possible codes)

> rcpd4 <- function(n,rN=rN.P,rX=rX.G){
+ return(sapply(rN(n), function(x) sum(rX(x))))}

If we generate one million variables, we can get an estimator for the quantile,

> set.seed(1)
> quantile(rcpd4(1e6),.995)
   99.5% 
13651.64

Another idea is to remember a proporty of the Gamma distribution: a sum of independent Gamma distributions is still Gamma (with additional assumptions on the parameters, but here we consider identical Gamma distributions). Thus, it is possible to compute the cumulative distribution function of the compound sum,

> F <- function(x,lambda=100,nmax=1000) {n <- 0:nmax
+ sum(pgamma(x,n*alpha,beta)*dpois(n,lambda))}

(or at least a approximation). If we invert that function, we get our quantile

> uniroot(function(x) F(x)-.995,c(1e-8,1e8))$root
[1] 13654.43

Which is consistent with our monte carlo computation. Now, we can also use fast Fourier transform here,

> n <- 2^20; lambda <- 100
> p <- diff(pgamma(0:n-.5,alpha,beta))
> f <- Re(fft(exp(lambda*(fft(p)-1)),inverse=TRUE))/n
> sum(cumsum(f)<.995)
[1] 13654

Now, if it is simple, is it efficient ? Let us compare for instance computation time to get those three outputs,

> system.time(quantile(rcpd4(1e5),.995))
       user      system     elapsed 
      2.453       0.106       2.611 
> system.time(uniroot(function(x) F(x)-.995,c(1e-8,1e8))$root)
       user      system     elapsed
      0.041       0.012       0.361 
> system.time(sum(cumsum(Re(fft(exp(lambda*(fft(p)-1)),inverse=TRUE))/n)<.995))
       user      system     elapsed
      0.527       0.020       0.560

Computations here are comparable with the (numerical) inversion of the cumulative distribution function. Except that here, we were lucky: if the distribution is not Gamma but log normal, the second algorithm cannot be used.

^{1. This numerical example is taken from the first chapter of Computational Actuarial Science with R, to appear in a few months.}

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics » R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Generating functions

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)