Modeling LGD with Proportional Odds Model
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The LGD model is an important component in the expected loss calculation. In https://statcompute.wordpress.com/2015/11/01/quasi-binomial-model-in-sas, I discussed how to model LGD with the quasi-binomial regression that is simple and makes no distributional assumption.
In the real-world LGD data, we usually would observe 3 ordered categories of values, including 0, 1, and in-betweens. In cases with a nontrivial number of 0 and 1 values, the ordered logit model, which is also known as Proportional Odds model, can be applicable. In the demonstration below, I will show how we can potentially use the proportional odds model in the LGD model development.
First of all, we need to categorize all numeric LGD values into three ordinal categories. As shown below, there are more than 30% of 0 and 1 values.
df <- read.csv("lgd.csv") df$lgd <- round(1 - df$Recovery_rate, 4) df$lgd_cat <- cut(df$lgd, breaks = c(-Inf, 0, 0.9999, Inf), labels = c("L", "M", "H"), ordered_result = T) summary(df$lgd_cat) # L M H # 730 1672 143
The estimation of a proportional odds model is straightforward with clm() in the ordinal package or polr() in the MASS package. As demonstrated below, in addition to the coefficient for LTV, there are 2 intercepts to differentiate 3 categories.
m1 <- ordinal::clm(lgd_cat ~ LTV, data = df) summary(m1) #Coefficients: # Estimate Std. Error z value Pr(>|z|) #LTV 2.0777 0.1267 16.4 <2e-16 *** #--- #Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # #Threshold coefficients: # Estimate Std. Error z value #L|M 0.38134 0.08676 4.396 #M|H 4.50145 0.14427 31.201
It is important to point out that, in a proportional odds model, it is the cumulative probability that is derived from the linear combination of model variables. For instance, the cumulative probability of LGD belong to L or M is formulated as
Prob(LGD <= M) = Exp(4.50 – 2.08 * LTV) / (1 + Exp(4.50 – 2.08 * LTV))
Likewise, we would have
Prob(LGD <= L) = Exp(0.38 – 2.08 * LTV) / (1 + Exp(0.38 – 2.08 * LTV))
With above cumulative probabilities, then we can calculate the probability of each category as below.
Prob(LGD = L) = Prob(LGD <= L)
Prob(LGD = M) = Prob(LGD <= M) – Prob(LGD <= L)
Prob(LGD = H) = 1 – Prob(LGD <= M)
The R code is showing the detailed calculation how to convert cumulative probabilities to probabilities of interest.
cumprob_L <- exp(df$LTV * (-m1$beta) + m1$Theta[1]) / (1 + exp(df$LTV * (-m1$beta) + m1$Theta[1])) cumprob_M <- exp(df$LTV * (-m1$beta) + m1$Theta[2]) / (1 + exp(df$LTV * (-m1$beta) + m1$Theta[2])) prob_L <- cumprob_L prob_M <- cumprob_M - cumprob_L prob_H <- 1 - cumprob_M pred <- data.frame(prob_L, prob_M, prob_H) apply(pred, 2, mean) # prob_L prob_M prob_H #0.28751210 0.65679888 0.05568903
After predicting the probability of each category, we would need another sub-model to estimate the conditional LGD for lgd_cat = “M” with either Beta or Simplex regression. (See https://statcompute.wordpress.com/2014/10/27/flexible-beta-modeling and https://statcompute.wordpress.com/2014/02/02/simplex-model-in-r) The final LGD prediction can be formulated as
E(LGD|X)
= Prob(Y = 0|X) * E(Y|X, Y = 0) + Prob(Y = 1|X) * E(Y|X, Y = 1) + Prob(0 < Y < 1|X) * E(Y|X, 0 < Y < 1)
= Prob(Y = 1|X) + Prob(0 < Y < 1|X) * E(Y|X, 0 < Y < 1)
where E(Y|X, 0 < Y < 1) can be calculated from the sub-model.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.