Use domain knowledge to review prior distributions
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
At the Insurance Data Science conference, both Eric Novik and Paul-Christian Bürkner emphasised in their talks the value of thinking about the data generating process when building Bayesian statistical models. It is also a key step in Michael Betancourt’s Principled Bayesian Workflow.
In this post, I will discuss in more detail how to set priors, and review the prior and posterior parameter distributions, but also the prior predictive distributions with brms
(Bürkner (2017)).
The prior predictive distribution shows me how the model behaves before I use my data. Thus, I can check if the model describes the data generating process reasonably well, before I go through the lengthy process of fitting the model.
As an example, I will get back to the non-linear hierarchical growth curve model, which is also part of the brms
package vignette on non-linear models.
Load packages and data
First I load the relevant packages and data set.
library(rstan) library(brms) library(bayesplot) rstan_options(auto_write = TRUE) options(mc.cores = parallel::detectCores()) theme_update(text = element_text(family = "sans")) library(data.table) url <- "https://raw.githubusercontent.com/mages/diesunddas/master/Data/ClarkTriangle.csv" loss <- fread(url)
It is the aim to use growth curves to model the claims payments of insurance losses from different accident years (1991 - 2000) over time (development months) and predict future payments.
Prior predictive distribution
I will start with the same model as in the brms
vignette, but instead of fitting the model, I set the parameter sample_prior = "only"
to generate samples from the prior predictive distribution only, i.e. the data will be ignored and only the prior distributions will be used.
nlform1 <- bf(cum ~ ult * (1 - exp(-(dev/theta)^omega)), ult ~ 1 + (1|AY), omega ~ 1, theta ~ 1, nl = TRUE) m1 <- brm(nlform1, data = loss, family = gaussian(), prior = c( prior(normal(5000, 1000), nlpar = "ult"), prior(normal(1, 2), nlpar = "omega"), prior(normal(45, 10), nlpar = "theta") ), sample_prior = "only", seed = 1234, control = list(adapt_delta = 0.9) )
I can use the same plots to review the prior predictive distribution as I would have done for the posterior predictive distribution.
conditions <- data.frame(AY = unique(loss$AY)) rownames(conditions) <- unique(loss$AY) me_loss_prior1 <- marginal_effects( m1, conditions = conditions, re_formula = NULL, method = "predict" ) p0 <- plot(me_loss_prior1, ncol = 5, points = TRUE, plot = FALSE) p0$dev + ggtitle("Prior predictive distributions")
Perhaps this doesn’t look too bad, but the model generates also many negative claims payments, as I have assumed a Gaussian distribution. Yet, customers rarely pay claims back to the insurance company.
In addition, this model simplifies the variance structure to a constant \(\sigma^2\). In (Guszcza (2008)) Jim Guszcza suggested a variance that is proportional to the claims amount.
Review model code and priors
Let’s look at a part of the brm
-generated Stan code that describes the priors to better understand the model:
// priors including all constants target += normal_lpdf(b_ult | 5000, 1000); target += normal_lpdf(b_omega | 1, 2); target += normal_lpdf(b_theta | 45, 10); target += student_t_lpdf(sigma | 3, 0, 1964) - 1 * student_t_lccdf(0 | 3, 0, 1964); target += student_t_lpdf(sd_1 | 3, 0, 1964) - 1 * student_t_lccdf(0 | 3, 0, 1964); target += normal_lpdf(z_1[1] | 0, 1); // likelihood including all constants if (!prior_only) { target += normal_lpdf(Y | mu, sigma); }
The code shows the default priors set by brm
for sigma
and sd_1
, which I didn’t set explicitly. In addition, the last line shows the switch that turns off sampling from the posterior distributions.
Putting the model in ‘Greek’ makes it more readable:
\[ \begin{align} loss_{AY}(t) & \sim \mathcal{N}\left( \mu_{AY}(t), \sigma^2\right) \\ \mu_{AY}(t) & = \gamma_{AY} \cdot G(t; \omega, \theta)\\ G(t; \omega, \theta) & = 1 - \exp\left(-\left({t/\theta}\right)^\omega\right)\\ \gamma_{AY} & = \gamma + \gamma_{AY}^0 \\ \mbox{Priors}&:\\ \gamma & \sim \mathcal{N}(5000, 1000)\\ \omega & \sim \mathcal{N}\left(1, 2^2\right)\\ \theta & \sim \mathcal{N}\left(45, 10^2\right) \\ \sigma & \sim \mbox{Student-t}\left(3, 0, 1964\right)^+\\ \gamma_{AY}^0 & \sim \mathcal{N}(0, \sigma_{\gamma^0}^2) \\ \sigma_{\gamma^0} & \sim \mbox{Student-t}\left(3, 0, 1964\right)^+\\ \end{align} \]
Utilise domain knowledge
There are a few aspects that I would like to change to the original model:
- Use the loss ratio instead of the loss amounts to make data from the different years more comparable and to bring values closer to 1
- Change the distribution family from Gaussian to lognormal to avoid negative payments being generated
- Assume a constant \(\sigma\) parameter, which means a constant coefficient of variation in case of the lognormal distribution, i.e. the standard deviation is proportional to the mean
- Use development year instead of development month to reduce the scale of \(t\)
- Use parameter \(\phi=1/\theta\), since I believe \(0< \phi < 1\).
- Shrink uncertainty and shift \(\omega\), as I believe \(\omega\) will be around 1.25
- With the move to loss ratios the standard deviations \(\sigma_{\gamma^0}\) and \(\sigma\) will be much smaller. Therefore, I reduce the scale parameter and increase the degrees of freedoms from 3 to 5 for the Student-t distribution
My new model looks like this now:
\[ \begin{align} \ell_{AY}(t) & \sim \log \mathcal{N}\left(\log(\mu_{AY}(t)), \sigma^2\right) \\ \mu_{AY}(t) & = \gamma_{AY} \cdot G(t; \omega, \phi)\\ G(t;\omega, \phi) & = 1 - \exp\left(-\left(t\cdot \phi\right)^\omega\right)\\ \gamma_{AY} & = \gamma + \gamma_{AY}^0 \\ \mbox{Priors}&:\\ \gamma & \sim \log \mathcal{N}\left(\log(0.5), \log(1.2)^2\right)\\ \omega & \sim \mathcal{N}\left(1.25, 0.25^2\right)^+\\ \phi & \sim \mathcal{N}\left(0.25, 0.25^2\right)^+ \\ \sigma & \sim \mbox{Student-t}\left(5, 0, 0.25\right)^+\\ \gamma_{AY}^0 & \sim \mathcal{N}(0, \sigma_{\gamma^0}^2) \\ \sigma_{\gamma^0} & \sim \mbox{Student-t}\left(5, 0, 0.25\right)^+\\ \end{align} \]
Assuming a constant coefficient of variation across development time seems broadly okay:
loss <- loss[, `:=`(loss_ratio = cum/loss$premium, dev_year = (dev+6)/12)] print( loss[, list(`CV(loss_ratio)` = sd(loss_ratio)/mean(loss_ratio)), by=dev_year], digits=1) ## dev_year CV(loss_ratio) ## 1: 1 0.14 ## 2: 2 0.08 ## 3: 3 0.08 ## 4: 4 0.15 ## 5: 5 0.13 ## 6: 6 0.09 ## 7: 7 0.11 ## 8: 8 0.14 ## 9: 9 0.21 ## 10: 10 NA
To review if my new model makes sense I start by generating samples from the priors again:
nlform2 <- bf(loss_ratio ~ log(ulr * (1 - exp(-(dev_year*phi)^omega))), ulr ~ 1 + (1|AY), omega ~ 1, phi ~ 1, nl = TRUE) m2 <- brm(nlform2, data = loss, family = lognormal(link = "identity", link_sigma = "log"), prior = c( prior(lognormal(log(0.5), log(1.2)), nlpar = "ulr", lb=0), prior(normal(1.25, 0.25), nlpar = "omega", lb=0), prior(normal(0.25, 0.25), nlpar = "phi", lb=0), prior(student_t(5, 0, 0.25), class = "sigma"), prior(student_t(5, 0, 0.25), class = "sd", nlpar="ulr") ), sample_prior = "only", seed = 1234 )
Prior parameter distributions
library(bayesplot) theme_update(text = element_text(family = "sans")) mcmc_areas( as.array(m2), pars = c("b_ulr_Intercept", "b_omega_Intercept", "b_phi_Intercept", "sd_AY__ulr_Intercept", "sigma"), prob = 0.8, # 80% intervals prob_outer = 0.99, # 99% point_est = "mean" ) + ggplot2::labs( title = "Prior parameter distributions", subtitle = "with medians and 80% intervals" )
The prior parameter distributions look very similar to what I would expect, given the plot above.
Prior predictive distributions
me_loss_prior2 <- marginal_effects( m2, conditions = conditions, re_formula = NULL, method = "predict" ) p1 <- plot(me_loss_prior2, ncol = 5, points = TRUE, plot = FALSE) p1$dev + ggtitle("Prior predictive distributions")
The prior predictive distributions look also more in line with the data.
I am happy with the model as it is. In my next step, I start using the data to fit the model.
Generate posterior samples
To fit my model with the data I call update
and set the parameter sample_prior="no"
. Note, the model doesn’t need to be recompiled.
(fit_m1 <- update(m2, sample_prior="no", seed = 1234)) ## Family: lognormal ## Links: mu = identity; sigma = identity ## Formula: loss_ratio ~ log(ulr * (1 - exp(-(dev_year * phi)^omega))) ## ulr ~ 1 + (1 | AY) ## omega ~ 1 ## phi ~ 1 ## Data: loss (Number of observations: 55) ## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; ## total post-warmup samples = 4000 ## ## Group-Level Effects: ## ~AY (Number of levels: 10) ## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat ## sd(ulr_Intercept) 0.04 0.01 0.02 0.07 1117 1.00 ## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat ## ulr_Intercept 0.42 0.02 0.38 0.47 1759 1.00 ## omega_Intercept 1.86 0.05 1.76 1.95 2558 1.00 ## phi_Intercept 0.26 0.01 0.23 0.28 2101 1.00 ## ## Family Specific Parameters: ## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat ## sigma 0.10 0.01 0.08 0.12 2379 1.00 ## ## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample ## is a crude measure of effective sample size, and Rhat is the potential ## scale reduction factor on split chains (at convergence, Rhat = 1).
The model looks well behave at a first glance.
Posterior parameter distributions
The plot of the posterior parameter distributions shows nicely how my priors shrunk:
mcmc_areas( as.array(fit_m1), pars = c("b_ulr_Intercept", "b_omega_Intercept", "b_phi_Intercept", "sd_AY__ulr_Intercept", "sigma"), prob = 0.8, # 80% intervals prob_outer = 0.99, # 99% point_est = "mean" ) + ggplot2::labs( title = "Posterior parameter distributions", subtitle = "with medians and 80% intervals" )
Posterior predictive distributions
Finally, it is time to plot the posterior predictive distributions:
me_loss_posterior <- marginal_effects( fit_m1, conditions = conditions, re_formula = NULL, method = "predict" ) p2 <- plot(me_loss_posterior, ncol = 5, points = TRUE, plot = FALSE) p2$dev + ggtitle("Posterior predictive distributions")
Conclusions
The plot looks good, but the model underestimates the claims development for the 1992 and 1993 accident years.
- Are those years and the data outliers?
- Should the model allow the parameters \(\omega\) and \(\theta\) to vary by year?
- Should I consider a different growth curve family?
Tapping into expert knowledge can be invaluable. However, many domain knowledge experts will not be statisticians and will find it difficult to form an opinion on the ‘Greek’ model or the parameters.
Yet, often experts will have a view on the data generated by the model. Showing them the output of the prior and posterior predictive distributions can be a lot more fruitful.
And as I said earlier, don’t forget they are experts, and hence like/likely to disagree with the non-expert. Use this bias to your advantage!
Session Info
sessionInfo() ## R version 3.5.1 (2018-07-02) ## Platform: x86_64-apple-darwin15.6.0 (64-bit) ## Running under: macOS 10.14.2 ## ## Matrix products: default ## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib ## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib ## ## locale: ## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: ## [1] bindrcpp_0.2.2 latticeExtra_0.6-28 RColorBrewer_1.1-2 ## [4] lattice_0.20-38 data.table_1.11.8 bayesplot_1.6.0 ## [7] brms_2.6.0 Rcpp_1.0.0 rstan_2.18.2 ## [10] StanHeaders_2.18.0 ggplot2_3.1.0 ## ## loaded via a namespace (and not attached): ## [1] Brobdingnag_1.2-6 gtools_3.8.1 threejs_0.3.1 ## [4] shiny_1.2.0 assertthat_0.2.0 stats4_3.5.1 ## [7] yaml_2.2.0 pillar_1.3.0 backports_1.1.2 ## [10] glue_1.3.0 digest_0.6.18 promises_1.0.1 ## [13] colorspace_1.3-2 htmltools_0.3.6 httpuv_1.4.5 ## [16] Matrix_1.2-15 plyr_1.8.4 dygraphs_1.1.1.6 ## [19] pkgconfig_2.0.2 bookdown_0.8 purrr_0.2.5 ## [22] xtable_1.8-3 mvtnorm_1.0-8 scales_1.0.0 ## [25] processx_3.2.0 later_0.7.5 tibble_1.4.2 ## [28] DT_0.5 shinyjs_1.0 withr_2.1.2 ## [31] lazyeval_0.2.1 cli_1.0.1 magrittr_1.5 ## [34] crayon_1.3.4 mime_0.6 evaluate_0.12 ## [37] ps_1.2.1 nlme_3.1-137 xts_0.11-2 ## [40] pkgbuild_1.0.2 blogdown_0.9 colourpicker_1.0 ## [43] rsconnect_0.8.12 tools_3.5.1 loo_2.0.0 ## [46] prettyunits_1.0.2 matrixStats_0.54.0 stringr_1.3.1 ## [49] munsell_0.5.0 callr_3.0.0 compiler_3.5.1 ## [52] rlang_0.3.0.1 debugme_1.1.0 grid_3.5.1 ## [55] ggridges_0.5.1 htmlwidgets_1.3 crosstalk_1.0.0 ## [58] igraph_1.2.2 miniUI_0.1.1.1 labeling_0.3 ## [61] base64enc_0.1-3 rmarkdown_1.10 codetools_0.2-15 ## [64] gtable_0.2.0 curl_3.2 inline_0.3.15 ## [67] abind_1.4-5 reshape2_1.4.3 markdown_0.8 ## [70] R6_2.3.0 gridExtra_2.3 rstantools_1.5.1 ## [73] zoo_1.8-4 knitr_1.20 bridgesampling_0.6-0 ## [76] dplyr_0.7.8 shinythemes_1.1.2 bindr_0.1.1 ## [79] shinystan_2.5.0 rprojroot_1.3-2 stringi_1.2.4 ## [82] parallel_3.5.1 tidyselect_0.2.5 xfun_0.4 ## [85] coda_0.19-2
References
Bürkner, Paul-Christian. 2017. “brms: An R Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software 80 (1): 1–28. https://doi.org/10.18637/jss.v080.i01.
Guszcza, James. 2008. “Hierarchical Growth Curve Models for Loss Reserving.” In, 146–73. CAS Fall Forum; https://www.casact.org/pubs/forum/08fforum/7Guszcza.pdf.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.