Deployment of Binning Outcomes in Production
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In my previous post (https://statcompute.wordpress.com/2019/03/10/a-summary-of-my-home-brew-binning-algorithms-for-scorecard-development), I’ve shown different monotonic binning algorithm that I developed over time. However, these binning functions are all useless without a deployment vehicle in production. During the weekend, I finally had time to draft a R function
(https://github.com/statcompute/MonotonicBinning/blob/master/code/calc_woe.R) that can be used to deploy the binning outcome and to apply the WoE transformation to the attribute from an input data frame.
Below is a complete example showing how to apply the binning function mono_bin() to an attribute named “ltv” in the data frame, generate the binning specification, and then deploy the binning logic to calculate the WoE transformation of “ltv”. There are two objects returned from the calc_woe.R() function, the original data frame with an new column named “woe.ltv” and a summary table showing the population stability index (PSI) of the input attribute “ltv”.
While all are welcome to use my R codes and functions for your own purposes, I greatly appreciate it if you could reference the work and acknowledge my efforts.
url <- "https://github.com/statcompute/MonotonicBinning/blob/master/data/accepts.rds?raw=true" | |
download.file(url, "df.rds", mode = "wb") | |
df <- readRDS("df.rds") | |
source("https://raw.githubusercontent.com/statcompute/MonotonicBinning/master/code/manual_bin.R") | |
source("https://raw.githubusercontent.com/statcompute/MonotonicBinning/master/code/mono_bin.R") | |
ltv_bin <- mono_bin(df, bad, ltv) | |
ltv_bin$df | |
# bin rule freq dist mv_cnt bad_freq bad_rate woe iv ks | |
# 1 01 $X <= 86 1108 0.1898 0 122 0.1101 -0.7337 0.0810 11.0448 | |
# 2 02 $X > 86 & $X <= 95 1081 0.1852 0 166 0.1536 -0.3510 0.0205 16.8807 | |
# 3 03 $X > 95 & $X <= 101 1102 0.1888 0 242 0.2196 0.0880 0.0015 15.1771 | |
# 4 04 $X > 101 & $X <= 106 743 0.1273 0 177 0.2382 0.1935 0.0050 12.5734 | |
# 5 05 $X > 106 & $X <= 115 935 0.1602 0 226 0.2417 0.2126 0.0077 8.9540 | |
# 6 06 $X > 115 | is.na($X) 868 0.1487 1 263 0.3030 0.5229 0.0468 0.0000 | |
source("https://raw.githubusercontent.com/statcompute/MonotonicBinning/master/code/calc_woe.R") | |
ltv_woe <- calc_woe(df[sample(seq(nrow(df)), 1000), ], ltv, ltv_bin$df) | |
ltv_woe$psi | |
# bin rule dist woe cal_freq cal_dist cal_woe psi | |
# 1 01 $X <= 86 0.1898 -0.7337 188 0.188 -0.7337 0e+00 | |
# 2 02 $X > 86 & $X <= 95 0.1852 -0.3510 179 0.179 -0.3510 2e-04 | |
# 3 03 $X > 95 & $X <= 101 0.1888 0.0880 192 0.192 0.0880 1e-04 | |
# 4 04 $X > 101 & $X <= 106 0.1273 0.1935 129 0.129 0.1935 0e+00 | |
# 5 05 $X > 106 & $X <= 115 0.1602 0.2126 167 0.167 0.2126 3e-04 | |
# 6 06 $X > 115 | is.na($X) 0.1487 0.5229 145 0.145 0.5229 1e-04 | |
head(ltv_woe$df[, c("ltv", "woe.ltv")]) | |
# ltv woe.ltv | |
# 2378 74 -0.7337 | |
# 1897 60 -0.7337 | |
# 2551 80 -0.7337 | |
# 2996 83 -0.7337 | |
# 1174 85 -0.7337 | |
# 2073 74 -0.7337 |
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.