Site icon R-bloggers

Model Operational Loss Directly with Tweedie GLM

[This article was first published on S+/R – Yet Another Blog in Statistical Computing, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In the development of operational loss forecasting models, the Frequency-Severity modeling approach, which the frequency and the severity of a Unit of Measure (UoM) are modeled separately, has been widely employed in the banking industry. However, sometimes it also makes sense to model the operational loss directly, especially for UoMs with non-material losses. First of all, given the low loss amount, the effort of developing two models, e.g. frequency and severity, might not be justified. Secondly, for UoMs with low losses due to low frequencies, modeling the frequency and the severity separately might overlook the internal connection between the low frequency and the subsequent low loss amount. For instance, when the frequency N = 0, then the loss L = $0 inevitably.

The Tweedie distribution is defined as a Poisson sum of Gamma random variables. In particular, if the frequency of loss events N is assumed a Poisson distribution and the loss amount L_i of an event i, where i = 0, 1 … N, is assumed a Gamma distribution, then the total loss amount L = SUM[L_i] would have a Tweedie distribution. When there is no loss event, e.g. N = 0, then Prob(L = $0) = Prob(N = 0) = Exp(-Lambda). However, when N > 0, then L = L_1 + … + L_N > $0 is governed by a Gamma distribution, e.g. sum of I.I.D. Gamma also being Gamma.

For the Tweedie loss, E(L) = Mu and VAR(L) = Phi * (Mu ** P), where P is called the index parameter and Phi is the dispersion parameter. When P approaches 1 and therefore VAR(L) approaches Phi * E(L), the Tweedie would be similar to a Poisson-like distribution. When P approaches 2 and therefore VAR(L) approaches Phi * (E(L) ** 2), the Tweedie would be similar to a Gamma distribution. When P is between 1 and 2, then the Tweedie would be a compound mixture of Poisson and Gamma, where P and Phi can be estimated.

To estimate a regression with the Tweedie distributional assumption, there are two implementation approaches in R with cplm and statmod packages respectively. With the cplm package, the Tweedie regression can be estimated directly as long as P is in the range of (1, 2), as shown below. In the example, the estimated index parameter P is 1.42.

> library(cplm)
> data(FineRoot)
> m1 <- cpglm(RLD ~ Zone + Stock, data = FineRoot)
> summary(m1)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.0611  -0.6475  -0.3928   0.1380   1.9627  

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.95141    0.14643 -13.327  < 2e-16 ***
ZoneOuter   -0.85693    0.13292  -6.447 2.66e-10 ***
StockMM106   0.01177    0.17535   0.067    0.947    
StockMark   -0.83933    0.17476  -4.803 2.06e-06 ***
---
Estimated dispersion parameter: 0.35092
Estimated index parameter: 1.4216 

Residual deviance: 203.91  on 507  degrees of freedom
AIC:  -157.33 

The statmod package provides a more general and flexible solution with the two-stage estimation, which will estimate the P parameter first and then estimate regression parameters. In the real-world practice, we could do a coarse search to narrow down a reasonable range of P and then do a fine search to identify the optimal P value. As shown below, all estimated parameters are fairly consistent with ones in the previous example.

> library(tweedie)
> library(statmod)
> prof <- tweedie.profile(RLD ~ Zone + Stock, data = FineRoot, p.vec = seq(1.1, 1.9, 0.01), method = "series")
1.1 1.11 1.12 1.13 1.14 1.15 1.16 1.17 1.18 1.19 1.2 1.21 1.22 1.23 1.24 1.25 1.26 1.27 1.28 1.29 1.3 1.31 1.32 1.33 1.34 1.35 1.36 1.37 1.38 1.39 1.4 1.41 1.42 1.43 1.44 1.45 1.46 1.47 1.48 1.49 1.5 1.51 1.52 1.53 1.54 1.55 1.56 1.57 1.58 1.59 1.6 1.61 1.62 1.63 1.64 1.65 1.66 1.67 1.68 1.69 1.7 1.71 1.72 1.73 1.74 1.75 1.76 1.77 1.78 1.79 1.8 1.81 1.82 1.83 1.84 1.85 1.86 1.87 1.88 1.89 1.9 
.................................................................................Done.
> prof$p.max
[1] 1.426531
> m2 <- glm(RLD ~ Zone + Stock, data = FineRoot, family = tweedie(var.power = prof$p.max, link.power = 0))
> summary(m2)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.0712  -0.6559  -0.3954   0.1380   1.9728  

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.95056    0.14667 -13.299  < 2e-16 ***
ZoneOuter   -0.85823    0.13297  -6.454 2.55e-10 ***
StockMM106   0.01204    0.17561   0.069    0.945    
StockMark   -0.84044    0.17492  -4.805 2.04e-06 ***
---
(Dispersion parameter for Tweedie family taken to be 0.4496605)

    Null deviance: 241.48  on 510  degrees of freedom
Residual deviance: 207.68  on 507  degrees of freedom
AIC: NA

To leave a comment for the author, please follow the link and comment on their blog: S+/R – Yet Another Blog in Statistical Computing.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.