Deployment of Binning Outcomes in Production

[This article was first published on S+/R – Yet Another Blog in Statistical Computing, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In my previous post (https://statcompute.wordpress.com/2019/03/10/a-summary-of-my-home-brew-binning-algorithms-for-scorecard-development), I’ve shown different monotonic binning algorithm that I developed over time. However, these binning functions are all useless without a deployment vehicle in production. During the weekend, I finally had time to draft a R function
(https://github.com/statcompute/MonotonicBinning/blob/master/code/calc_woe.R) that can be used to deploy the binning outcome and to apply the WoE transformation to the attribute from an input data frame.

Below is a complete example showing how to apply the binning function mono_bin() to an attribute named “ltv” in the data frame, generate the binning specification, and then deploy the binning logic to calculate the WoE transformation of “ltv”. There are two objects returned from the calc_woe.R() function, the original data frame with an new column named “woe.ltv” and a summary table showing the population stability index (PSI) of the input attribute “ltv”.

While all are welcome to use my R codes and functions for your own purposes, I greatly appreciate it if you could reference the work and acknowledge my efforts.

url <- "https://github.com/statcompute/MonotonicBinning/blob/master/data/accepts.rds?raw=true"
download.file(url, "df.rds", mode = "wb")
df <- readRDS("df.rds")
source("https://raw.githubusercontent.com/statcompute/MonotonicBinning/master/code/manual_bin.R")
source("https://raw.githubusercontent.com/statcompute/MonotonicBinning/master/code/mono_bin.R")
ltv_bin <- mono_bin(df, bad, ltv)
ltv_bin$df
# bin rule freq dist mv_cnt bad_freq bad_rate woe iv ks
# 1 01 $X <= 86 1108 0.1898 0 122 0.1101 -0.7337 0.0810 11.0448
# 2 02 $X > 86 & $X <= 95 1081 0.1852 0 166 0.1536 -0.3510 0.0205 16.8807
# 3 03 $X > 95 & $X <= 101 1102 0.1888 0 242 0.2196 0.0880 0.0015 15.1771
# 4 04 $X > 101 & $X <= 106 743 0.1273 0 177 0.2382 0.1935 0.0050 12.5734
# 5 05 $X > 106 & $X <= 115 935 0.1602 0 226 0.2417 0.2126 0.0077 8.9540
# 6 06 $X > 115 | is.na($X) 868 0.1487 1 263 0.3030 0.5229 0.0468 0.0000
source("https://raw.githubusercontent.com/statcompute/MonotonicBinning/master/code/calc_woe.R")
ltv_woe <- calc_woe(df[sample(seq(nrow(df)), 1000), ], ltv, ltv_bin$df)
ltv_woe$psi
# bin rule dist woe cal_freq cal_dist cal_woe psi
# 1 01 $X <= 86 0.1898 -0.7337 188 0.188 -0.7337 0e+00
# 2 02 $X > 86 & $X <= 95 0.1852 -0.3510 179 0.179 -0.3510 2e-04
# 3 03 $X > 95 & $X <= 101 0.1888 0.0880 192 0.192 0.0880 1e-04
# 4 04 $X > 101 & $X <= 106 0.1273 0.1935 129 0.129 0.1935 0e+00
# 5 05 $X > 106 & $X <= 115 0.1602 0.2126 167 0.167 0.2126 3e-04
# 6 06 $X > 115 | is.na($X) 0.1487 0.5229 145 0.145 0.5229 1e-04
head(ltv_woe$df[, c("ltv", "woe.ltv")])
# ltv woe.ltv
# 2378 74 -0.7337
# 1897 60 -0.7337
# 2551 80 -0.7337
# 2996 83 -0.7337
# 1174 85 -0.7337
# 2073 74 -0.7337
view raw woe_deploy.R hosted with ❤ by GitHub

To leave a comment for the author, please follow the link and comment on their blog: S+/R – Yet Another Blog in Statistical Computing.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)