Posterior predictive output with Stan

Markus Gesmann

7 years ago

[This article was first published on mages' blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I continue my Stan experiments with another insurance example. Here I am particular interested in the posterior predictive distribution from only three data points. Or, to put it differently I have a customer of three years and I’d like to predict the expected claims cost for the next year to set or adjust the premium.

The example is taken from section 16.17 in Loss Models: From Data to Decisions [1]. Some time ago I used the same example to get my head around a Bayesian credibility model.

Suppose the claims likelihood distribution is believed to follow an exponential distribution for a given parameter (Theta). The prior parameter distribution on (Theta) is assumed to be a gamma distribution with parameters (alpha=4, beta=1000):
[begin{aligned}Theta & sim mbox{Gamma}(alpha, beta)\
ell_i & sim mbox{Exp}(Theta) , ; forall i in N
end{aligned}]In this case the predictive distribution is a Pareto II distribution with density (f(x) = frac{alpha beta^alpha}{(x+beta)^{alpha+1}}) and a mean of (frac{beta}{alpha-1}=,)333.33.

I have three independent observations, namely losses of $100, $950 and $450. The posterior predictive expected loss is $416.67 and can be derived analytical, as shown in my previous post. Now let me reproduce the answer with Stan as well.

Implementing the model in Stan is straightforward and I follow the same steps as in my simple example of last week. However, here I am also interested in the posterior predictive distribution, hence I add a generated quantities code block.

The output shows a simulated predictive mean of $416.86, close to the analytical answer. I can also read out that the 75%ile of the posterior predictive distribution is a loss of $542 vs. $414 from the prior predictive. That means every four years I shouldn’t be surprised to observe a loss in excess of $500. Further I can read of that 90% of losses are expected to be less than $950, or in other words the observation in my data may reflect the outcome of an event with a 1 in 10 return period.

Comparing the sampling output from Stan with the analytical output gives me some confidence that I am doing the ‘right thing’.

References

[1] Klugman, S. A., Panjer, H. H. & Willmot, G. E. (2004), Loss Models: From Data to Decisions, Wiley Series in Probability and Statistics.

Session Info

R version 3.2.0 (2015-04-16)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.3 (Yosemite)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base     

other attached packages:
[1] lattice_0.20-31 actuar_1.1-8 rstan_2.6.0 inline_0.3.14  
[5] Rcpp_0.11.6    

loaded via a namespace (and not attached):
[1] tools_3.2.0  codetools_0.2-11 grid_3.2.0 stats4_3.2.0

< !--

Remark

Interestingly Petr Keil posted an article on Survival analysis: basic terms, the exponential model, censoring, examples in R and JAGS last week, which uses the same model in a different context.
–>

This post was originally published on mages’ blog.

To leave a comment for the author, please follow the link and comment on their blog: mages' blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.