Some Considerations of Modeling Severity in Operational Losses

statcompute

7 years ago

[This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In the Loss Distributional Approach (LDA) for Operational Risk models, multiple distributions, including Log Normal, Gamma, Burr, Pareto, and so on, can be considered candidates for the distribution of severity measures. However, the challenge remains in the stress testing exercise, e.g. CCAR, to relate operational losses to macro-economic scenarios denoted by a set of macro-economic attributes.

As a result, a more sensible approach employed in the annual CCAR exercise to model operational losses might be the regression-based modeling approach, which can intuitively link the severity measure of operational losses to macro-economic drivers with a explicit functional form within the framework of Generalized Linear Models (GLM). While 2-parameter Pareto distribution and 3-parameter Burr distribution are theoretically attractive, their implmentations in the regression setting could become difficult and even impractical without the availability of off-shelf modeling tools and variable selection routines. In such situation, Log Normal and Gamma distributional assumptions are much more realistic with successful applications in actuarial practices. For details, please see “Severity Distributions for GLMs” by Fu and Moncher in 2004.

While both Log Normal and Gamma are most popular choices for the severity model, there are pros and cons in each respectively. For instance, while Log Normal distributional assumption is extremely flexible and easy to understand, the predicted outcomes should be adjusted for the estimation bias. Fortunately, both SAS, e.g. SEVERITY PROCEDURE, and R, e.g. fitdistrplus package, provide convenient interfaces for the distribution selection procedure based on goodness-of-fit statistics and information criterion.


library(fitdistrplus)
library(insuranceData)
Fit1 <- fitdist(AutoCollision$Severity, dist = "lnorm", method = "mme")
Fit2 <- fitdist(AutoCollision$Severity, dist = "gamma", method = "mme")
gofstat(list(Fit1, Fit2))

#Goodness-of-fit statistics
#                             1-mme-lnorm 2-mme-gamma
#Kolmogorov-Smirnov statistic   0.1892567   0.1991059
#Cramer-von Mises statistic     0.2338694   0.2927953
#Anderson-Darling statistic     1.5772642   1.9370056
#
#Goodness-of-fit criteria
#                               1-mme-lnorm 2-mme-gamma
#Aikake's Information Criterion    376.2738    381.2264
#Bayesian Information Criterion    379.2053    384.1578

In the above output, Log Normal seems marginally better than Gamma in this particular case. Since either Log(SEVERITY) in Log Normal or SEVERITY in Gamma belongs to exponential distribution family, it is convenient to employ GLM() with related variable selection routines in the model development exercise.


summary(mdl1 <- glm(log(Severity) ~ -1 + Vehicle_Use, data = AutoCollision, family = gaussian(link = "identity")))

#Coefficients:
#                      Estimate Std. Error t value Pr(>|t|)
#Vehicle_UseBusiness    5.92432    0.07239   81.84   <2e-16 ***
#Vehicle_UseDriveLong   5.57621    0.07239   77.03   <2e-16 ***
#Vehicle_UseDriveShort  5.43405    0.07239   75.07   <2e-16 ***
#Vehicle_UsePleasure    5.35171    0.07239   73.93   <2e-16 ***

summary(mdl2 <- glm(Severity ~ -1 + Vehicle_Use, data = AutoCollision, family = Gamma(link = "log")))

#Coefficients:
#                      Estimate Std. Error t value Pr(>|t|)
#Vehicle_UseBusiness    5.97940    0.08618   69.38   <2e-16 ***
#Vehicle_UseDriveLong   5.58072    0.08618   64.76   <2e-16 ***
#Vehicle_UseDriveShort  5.44560    0.08618   63.19   <2e-16 ***
#Vehicle_UsePleasure    5.36225    0.08618   62.22   <2e-16 ***

As shown above, estimated coefficients are very similar in both Log Normal and Gamma regressions and standard erros are different due to different distributional assumptions. However, please note that predicted values of Log Normal regression should be adjusted by (RMSE ^ 2) / 2 before applying EXP().

To leave a comment for the author, please follow the link and comment on their blog: Yet Another Blog in Statistical Computing » S+/R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.