Hierarchical Loss Reserving with Stan
[This article was first published on mages' blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I continue with the growth curve model for loss reserving from last week’s post. Today, following the ideas of James Guszcza [2] I will add an hierarchical component to the model, by treating the ultimate loss cost of an accident year as a random effect. Initially, I will use the Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
nlme
R package, just as James did in his paper, and then move on to Stan/RStan [6], which will allow me to estimate the full distribution of future claims payments.Last week’s model assumed that cumulative claims payment could be described by a growth curve. I used the Weibull curve and will do so here again, but others should be considered as well, e.g. the log-logistic cumulative distribution function for long tail business, see [1].
The growth curve describes the proportion of claims paid up to a given development period compared to the ultimate claims cost at the end of time, hence often called development pattern. Cumulative distribution functions are often considered, as they increase monotonously from 0 to 100%. Multiplying the development pattern with the expected ultimate loss cost gives me then the expected cumulative paid to date value.
However, what I’d like to do is the opposite, I know the cumulative claims position to date and wish to estimate the ultimate claims cost instead. If the claims process is fairly stable over the years and say, once a claim has been notified the payment process is quite similar from year to year and claim to claim, then a growth curve model is not unreasonable. Yet, the number and the size of the yearly claims will be random, e.g. if a windstorm, fire, etc occurs or not. Hence, a random effect for the ultimate loss cost across accident years sounds very convincing to me.
Here is James’ model as described in [2]:
[
begin{align}
CL_{AY, dev} & sim mathcal{N}(mu_{AY, dev}, sigma^2_{dev}) \
mu_{dev} & = Ult_{AY} cdot G(dev|omega, theta)\
sigma_{dev} & = sigma sqrt{mu_{dev}}\
Ult_{AY} & sim mathcal{N}(mu_{ult}, sigma^2_{ult})\
G(dev|omega, theta) & = 1 – expleft(-left(frac{dev}{theta}right)^omegaright)
end{align}
]The cumulative losses (CL_{AY, dev}) for a given accident year (AY) and development period (dev) follow a Normal distribution with parameters (mu_{AY, dev}) and (sigma_{dev}).
The mean itself is modelled as the product of an accident year specific ultimate loss cost (Ult_{AY}) and a development period specific parametric growth curve (G(dev | omega, theta)). The variance is believed to increase in proportion with the mean. Finally the ultimate loss cost is modelled with a Normal distribution as well.
Assuming a Gaussian distribution of losses doesn’t sound quite intuitive to me, as loss are often skewed to the right, but I shall continue with this assumption here to make a comparison with [2] possible.
Using the example data set given in the paper I can reproduce the result in R with
nlme
:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Read data | |
url <- "https://raw.githubusercontent.com/mages/diesunddas/master/Data/ClarkTriangle.csv" | |
dat <- read.csv(url) | |
# Relabel accident years | |
dat$origin <- dat$AY-min(dat$AY)+1 | |
dat <- dat[order(dat$dev),] | |
# Add future dev years | |
nyears <- 12 | |
newdat <- data.frame( | |
origin=rep(1:10, each=nyears), | |
AY=rep(sort(unique(dat$AY)), each=nyears), | |
dev=rep(seq(from=6, to=nyears*12-6, by=12), 10) | |
) | |
newdat <- merge(dat, newdat, all=TRUE) | |
newdat <- newdat[order(newdat$dev),] | |
library(nlme) | |
start.vals <- c(ult = 5000, omega = 1.4, theta = 45) | |
w1 <- nlme(cum ~ ult*(1 - exp(-(dev/theta)^omega)), | |
fixed = list(ult~1, omega~1, theta ~ 1), | |
random = ult ~ 1 | origin, | |
weights = varPower(fixed=.5), | |
data=dat, start = start.vals) | |
summary(w1) | |
# Nonlinear mixed-effects model fit by maximum likelihood | |
# Model: cum ~ ult * (1 - exp(-(dev/theta)^omega)) | |
# Data: dat | |
# AIC BIC logLik | |
# 725.7576 735.7943 -357.8788 | |
# | |
# Random effects: | |
# Formula: ult ~ 1 | origin | |
# ult Residual | |
# StdDev: 543.0296 2.955047 | |
# | |
# Variance function: | |
# Structure: Power of variance covariate | |
# Formula: ~fitted(.) | |
# Parameter estimates: | |
# power | |
# 0.5 | |
# Fixed effects: list(ult ~ 1, omega ~ 1, theta ~ 1) | |
# Value Std.Error DF t-value p-value | |
# ult 5306.605 263.19680 43 20.16212 0 | |
# omega 1.306 0.03394 43 38.49663 0 | |
# theta 46.638 2.42193 43 19.25637 0 | |
# Correlation: | |
# ult omega | |
# omega -0.430 | |
# theta 0.668 -0.772 | |
# | |
# Standardized Within-Group Residuals: | |
# Min Q1 Med Q3 Max | |
# -1.47331314 -0.67337317 -0.04756236 0.40584781 2.94400230 | |
# | |
# Number of Observations: 55 | |
# Number of Groups: 10 | |
library(lattice) | |
xyplot(cum ~ dev | factor(AY), data=dat, layout=c(5,2), | |
main="Hierachical Growth Curve Model", | |
as.table=TRUE, xlim=range(newdat$dev), | |
scales=list(alternating=1), | |
key = list(space="top", columns=2, | |
text=list(labels=c("observation", "prediction")), | |
line=FALSE, points=list(pch=c(1,19), col=c(2,1))), | |
panel=function(x, y, subscripts, ...){ | |
panel.xyplot(x, y, t="b", pch=1, cex=0.5, col=2) | |
panel.xyplot(dat$dev[subscripts], | |
predict(w1, newdata=dat[subscripts,]), | |
t="b", pch=19, cex=0.5, col=1) | |
}) |
The fit looks pretty good, with only 5 parameters. See James’ paper for a more detailed discussion.
Let’s move this model into Stan. Here is my attempt, which builds on last week’s pooled model. With the generated quantities code block I go beyond the scope of the original paper, as I try to estimate the full posterior predictive distribution as well.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
data { | |
int<lower=0> N; // total number of rows | |
real cum[N]; // cumulative paid | |
real dev[N]; // development period | |
int<lower=0> n_origin; // number of origin years | |
int<lower=1, upper=n_origin> origin[N]; // origin years | |
// Treat future payments as missing data, see BUGS book: | |
// http://www.mrc-bsu.cam.ac.uk/wp-content/uploads/bugsbook_chapter9.pdf, page 194 | |
// and Stan Manual | |
int<lower=0> N_mis; // number of rows of prediction data set | |
real dev_mis[N_mis]; // development periods to predict | |
int<lower=1, upper=n_origin> origin_mis[N_mis]; // origin periods to predict | |
} | |
parameters { | |
real<lower=0> theta; // scale parameter | |
real<lower=0> omega; // shape parameter | |
real<lower=0> ult[n_origin]; // ultimate loss per origin period | |
real<lower=0> mu_ult; // mean ultimate loss across origin periods | |
real<lower=0> sigma; // process error | |
real<lower=0> sigma_ult; // random error | |
real cum_mis[N_mis]; // cumulative paid to predict | |
} | |
model { | |
real mu[N]; | |
real mu_mis[N_mis]; | |
real disp_sigma[N]; | |
real disp_sigma_mis[N_mis]; | |
// Priors | |
theta ~ normal(46, 10); // scale parameter | |
omega ~ normal(1, 2); // shape parameter | |
mu_ult ~ normal(5000, 1000); | |
sigma ~ cauchy(0, 5); | |
sigma_ult ~ cauchy(0, 5); | |
// Hyperparameters: Modelled parameters | |
ult ~ normal(mu_ult, sigma_ult); | |
for (i in 1:N){ | |
mu[i] = ult[origin[i]] * weibull_cdf(dev[i], omega, theta); | |
disp_sigma[i] = sigma * sqrt(mu[i]); | |
} | |
for (i in 1:N_mis){ | |
mu_mis[i] = ult[origin_mis[i]] * weibull_cdf(dev_mis[i], omega, theta); | |
disp_sigma_mis[i] = sigma * sqrt(mu_mis[i]); | |
} | |
// Likelihood: Modelled data | |
cum ~ normal(mu, disp_sigma); | |
cum_mis ~ normal(mu_mis, disp_sigma_mis); | |
} | |
generated quantities{ | |
real Y_mean[N + N_mis]; | |
real Y_pred[N + N_mis]; | |
for(i in 1:N){ | |
// Posterior parameter distribution of the mean | |
Y_mean[i] = ult[origin[i]] * weibull_cdf(dev[i], omega, theta); | |
// Posterior predictive distribution | |
Y_pred[i] = normal_rng(Y_mean[i], sigma*sqrt(Y_mean[i])); | |
} | |
for(i in 1:N_mis){ | |
// Posterior parameter distribution of the mean | |
Y_mean[N + i] = ult[origin_mis[i]] * weibull_cdf(dev_mis[i], omega, theta); | |
// Posterior predictive distribution | |
Y_pred[N + i] = normal_rng(Y_mean[N + i], sigma*sqrt(Y_mean[N + i])); | |
} | |
} |
mu[i] <- ult[origin[i]] * weibull_cdf(dev[i], omega, theta);
where I have an accident year (here labelled origin) specific ultimate loss. The notation
ult[origin[i]]
illustrates the hierarchical nature in Stan's language nicely.Let's run the model:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(rstan) | |
rstan_options(auto_write = TRUE) | |
options(mc.cores = parallel::detectCores()) | |
mlstanDso <- stan_model(file = "Hierachical_Model_for_Loss_Reserving.stan", | |
model_name = "MultiLevelGrowthCurve") | |
newdat <- newdat[is.na(newdat$cum),] | |
mlfit <- sampling(mlstanDso, iter=7000, warmup=2000, | |
thin=2, chains=4, | |
data = list(N = nrow(dat), | |
cum=dat$cum, | |
origin=dat$origin, | |
n_origin=length(unique(dat$origin)), | |
dev=dat$dev, | |
N_mis=nrow(newdat), | |
origin_mis=newdat$origin, | |
dev_mis=newdat$dev)) | |
print(mlfit, c("mu_ult", "omega", "theta", "sigma_ult", "sigma"), | |
probs=c(0.5, 0.75, 0.975)) | |
## Inference for Stan model: MultiLevelGrowthCurve. | |
## 4 chains, each with iter=2000; warmup=1000; thin=1; | |
## post-warmup draws per chain=1000, total post-warmup draws=4000. | |
## | |
## mean se_mean sd 50% 75% 97.5% n_eff Rhat | |
## mu_ult 5332.73 10.63 274.01 5320.96 5509.15 5897.11 665 1.01 | |
## omega 1.30 0.00 0.03 1.30 1.32 1.36 890 1.01 | |
## theta 47.27 0.12 2.52 47.10 48.82 52.80 433 1.02 | |
## sigma_ult 593.18 4.87 170.23 563.95 678.59 1002.70 1224 1.00 | |
## sigma 3.10 0.01 0.34 3.08 3.30 3.86 1107 1.00 | |
## | |
## Samples were drawn using NUTS(diag_e) at Sun Oct 30 06:21:19 2016. | |
## For each parameter, n_eff is a crude measure of effective sample size, | |
## and Rhat is the potential scale reduction factor on split chains (at | |
## convergence, Rhat=1). | |
ult <- as.data.frame(rstan::extract(mlfit, paste0("ult[", 1:10, "]"))) | |
summary(apply(ult, 1, sum)) | |
## Min. 1st Qu. Median Mean 3rd Qu. Max. | |
## 47540 52130 53420 53490 54830 63060 | |
rstan::traceplot(mlfit, c("mu_ult", "omega", "theta", "sigma_ult", "sigma")) |
nlme
output above.Let's take a look at the parameter traceplot and the densities of the estimated ultimate loss costs by origin year.
This looks all not too bad. The trace plots don't show any particular patterns, apart from (sigma_{ult}), which shows a little skewness.
The
generated quantities
code block in Stan allows me to get also the predictive distribution beyond the current data range. Here I forecast claims up to development year 12 and plot the predictions, including the 90% credibility interval of the posterior predictive distribution with the observations.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Y_mean <- rstan::extract(mlfit, "Y_mean") | |
Y_mean_cred <- apply(Y_mean$Y_mean, 2, quantile, c(0.025, 0.975)) | |
Y_mean_mean <- apply(Y_mean$Y_mean, 2, mean) | |
Y_pred <- rstan::extract(mlfit, "Y_pred") | |
Y_pred_cred <- apply(Y_pred$Y_pred, 2, quantile, c(0.025, 0.975)) | |
Y_pred_mean <- apply(Y_pred$Y_pred, 2, mean) | |
dat2 <- rbind(dat, newdat) | |
dat2$Y_pred_mean <- Y_pred_mean | |
dat2$Y_pred_cred5 <- Y_pred_cred[1,] | |
dat2$Y_pred_cred95 <- Y_pred_cred[2,] | |
library(lattice) | |
key <- list( | |
rep=FALSE, | |
lines=list(col=c("#00526D", "purple"), type=c("p","l"), pch=1), | |
text=list(lab=c("Observation","Mean Estimate")), | |
rectangles = list(col=adjustcolor("yellow", alpha.f=0.5), border="grey"), | |
text=list(lab="95% Prediction credible interval")) | |
xyplot( Y_pred_cred5 + Y_pred_cred95 + Y_pred_mean + cum ~ dev | factor(AY), | |
data=dat2, as.table=TRUE, | |
xlab="dev", ylab="cum", | |
main="Weibull Growth Curve with Random Effect", | |
scales=list(alternating=1), layout=c(5,2), key=key, | |
panel=function(x, y){ | |
n <- length(x) | |
k <- n/2 | |
upper <- y[(k/2+1):k] | |
lower <- y[1:(k/2)] | |
x <- x[1:(k/2)] | |
panel.polygon(c(x, rev(x)), c(upper, rev(lower)), | |
col = adjustcolor("yellow", alpha.f = 0.5), | |
border = "grey") | |
panel.lines(x, y[(k+1):(k+n/4)], col="purple") | |
panel.points(x, y[(n*3/4+1):n], lwd=2, col="#00526D") | |
}) |
The model seems to work rather well, even with the Gaussian distribution assumptions. Yet, it has still only 5 parameters. Note, this model doesn't need an additional artificial tail factor either.
Conclusions
The Bayesian approach sounds to me a lot more natural than many classical techniques around the chain-ladder methods. Thanks to Stan, I can get the full posterior distributions on both, the parameters and predictive distribution. I find communicating credibility intervals much easier than talking about the parameter, process and mean squared error.James Guszcza contributed to a follow up paper with Y. Zhank and V. Dukic [3] that extends the model described in [2]. It deals with skewness in loss data sets and the autoregressive nature of the errors in a cumulative time series.
Fank Schmid offers a more complex Bayesian analysis of claims reserving in [4], while Jake Morris highlights the similarities between a compartmental model used in drug research and loss reserving [5].
Finally, Glenn Meyers published a monograph on Stochastic Loss Reserving Using Bayesian MCMC Models earlier this year [7] that is worth taking a look at.
References
[1] David R. Clark. LDF Curve-Fitting and Stochastic Reserving: A Maximum Likelihood Approach. Casualty Actuarial Society, 2003. CAS Fall Forum.[2] James Guszcza. Hierarchical Growth Curve Models for Loss Reserving, 2008, CAS Fall Forum, pp. 146–173.
[3] Y. Zhang, V. Dukic, and James Guszcza. A Bayesian non-linear model for forecasting insurance loss payments. 2012. Journal of the Royal Statistical Society: Series A (Statistics in Society), 175: 637–656. doi: 10.1111/j.1467-985X.2011.01002.x
[4] Frank A. Schmid. Robust Loss Development Using MCMC. Available at SSRN. See also http://lossdev.r-forge.r-project.org/
[5] Jake Morris. Compartmental reserving in R. 2015. R in Insurance Conference.
[6] Stan Development Team. Stan: A C++ Library for Probability and Sampling, Version 2.8.0. 2015. http://mc-stan.org/.
[7] Glenn Meyers. Stochastic Loss Reserving Using Bayesian MCMC Models. Issue 1 of CAS Monograph Series. 2015.
Session Info
R version 3.2.2 (2015-08-14) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.11.1 (El Capitan) locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ChainLadder_0.2.3 rstan_2.8.0 ggplot2_1.0.1 Rcpp_0.12.1 [5] lattice_0.20-33 loaded via a namespace (and not attached): [1] nloptr_1.0.4 plyr_1.8.3 tools_3.2.2 [4] digest_0.6.8 lme4_1.1-10 statmod_1.4.21 [7] gtable_0.1.2 nlme_3.1-122 mgcv_1.8-8 [10] Matrix_1.2-2 parallel_3.2.2 biglm_0.9-1 [13] SparseM_1.7 proto_0.3-10 coda_0.18-1 [16] gridExtra_2.0.0 stringr_1.0.0 MatrixModels_0.4-1 [19] lmtest_0.9-34 stats4_3.2.2 grid_3.2.2 [22] nnet_7.3-11 tweedie_2.2.1 inline_0.3.14 [25] cplm_0.7-4 minqa_1.2.4 actuar_1.1-10 [28] reshape2_1.4.1 car_2.1-0 magrittr_1.5 [31] scales_0.3.0 codetools_0.2-14 MASS_7.3-44 [34] splines_3.2.2 rsconnect_0.3.79 systemfit_1.1-18 [37] pbkrtest_0.4-2 colorspace_1.2-6 quantreg_5.19 [40] labeling_0.3 sandwich_2.3-4 stringi_1.0-1 [43] munsell_0.4.2 zoo_1.7-12
To leave a comment for the author, please follow the link and comment on their blog: mages' blog.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.