PART – A Rule-Learning Algorithm

statcompute

9 years ago

[This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

> require('RWeka')
> require('pROC')
> 
> # SEPARATE DATA INTO TRAINING AND TESTING SETS
> df1 <- read.csv('credit_count.csv')
> df2 <- df1[df1$CARDHLDR == 1, 2:12]
> set.seed(2013)
> rows <- sample(1:nrow(df2), nrow(df2) - 1000)
> set1 <- df2[rows, ]
> set2 <- df2[-rows, ]
> 
> # BUILD A PART RULE MODEL
> mdl1 <- PART(factor(BAD) ~., data = set1)
> print(mdl1)
PART decision list
------------------

EXP_INC > 0.000774 AND
AGE > 21.833334 AND
INCOME > 2100 AND
MAJORDRG <= 0 AND
OWNRENT > 0 AND
MINORDRG <= 1: 0 (2564.0/103.0)

AGE > 21.25 AND
EXP_INC > 0.000774 AND
INCPER > 17010 AND
INCOME > 1774.583333 AND
MINORDRG <= 0: 0 (2278.0/129.0)

AGE > 20.75 AND
EXP_INC > 0.016071 AND
OWNRENT > 0 AND
SELFEMPL > 0 AND
EXP_INC <= 0.233759 AND
MINORDRG <= 1: 0 (56.0)

AGE > 20.75 AND
EXP_INC > 0.016071 AND
SELFEMPL <= 0 AND
OWNRENT > 0: 0 (1123.0/130.0)

OWNRENT <= 0 AND
AGE > 20.75 AND
ACADMOS <= 20 AND
ADEPCNT <= 2 AND
MINORDRG > 0 AND
ACADMOS <= 14: 0 (175.0/10.0)

OWNRENT <= 0 AND
AGE > 20.75 AND
ADEPCNT <= 0: 0 (1323.0/164.0)

INCOME > 1423 AND
OWNRENT <= 0 AND
MINORDRG <= 1 AND
ADEPCNT > 0 AND
SELFEMPL <= 0 AND
MINORDRG <= 0: 0 (943.0/124.0)

SELFEMPL > 0 AND
MAJORDRG <= 0 AND
ACADMOS > 85: 0 (24.0)

SELFEMPL > 0 AND
MAJORDRG <= 1 AND
MAJORDRG <= 0 AND
MINORDRG <= 0 AND
INCOME > 2708.333333: 0 (17.0)

SELFEMPL > 0 AND
MAJORDRG <= 1 AND
OWNRENT <= 0 AND
MINORDRG <= 0 AND
INCPER <= 8400: 0 (13.0)

SELFEMPL <= 0 AND
OWNRENT > 0 AND
ADEPCNT <= 0 AND
MINORDRG <= 0 AND
MAJORDRG <= 0: 0 (107.0/15.0)

OWNRENT <= 0 AND
MINORDRG > 0 AND
MINORDRG <= 1 AND
MAJORDRG <= 1 AND
MAJORDRG <= 0 AND
SELFEMPL <= 0: 0 (87.0/13.0)

OWNRENT <= 0 AND
SELFEMPL <= 0 AND
MAJORDRG <= 0 AND
MINORDRG <= 1: 0 (373.0/100.0)

MAJORDRG > 0 AND
MINORDRG > 0 AND
MAJORDRG <= 1 AND
MINORDRG <= 1: 0 (29.0)

SELFEMPL <= 0 AND
OWNRENT > 0 AND
MAJORDRG <= 0: 0 (199.0/57.0)

OWNRENT <= 0 AND
SELFEMPL <= 0: 0 (84.0/24.0)

MAJORDRG > 1: 0 (17.0/3.0)

ACADMOS <= 34 AND
MAJORDRG > 0: 0 (10.0)

MAJORDRG <= 0 AND
ADEPCNT <= 2 AND
OWNRENT <= 0: 0 (29.0/7.0)

OWNRENT > 0 AND
SELFEMPL > 0 AND
EXP_INC <= 0.218654 AND
MINORDRG <= 2 AND
MINORDRG <= 1: 0 (8.0/1.0)

OWNRENT > 0 AND
INCOME <= 2041.666667 AND
MAJORDRG > 0 AND
ADEPCNT > 0: 1 (5.0)

OWNRENT > 0 AND
AGE > 33.416668 AND
ACADMOS <= 174 AND
SELFEMPL > 0: 0 (10.0/1.0)

OWNRENT > 0 AND
SELFEMPL <= 0 AND
MINORDRG <= 1 AND
AGE > 33.5 AND
EXP_INC > 0.006737: 0 (6.0)

EXP_INC > 0.001179: 1 (16.0/1.0)

: 0 (3.0)

Number of Rules  : 	25

> pred1 <- data.frame(prob = predict(mdl1, newdata = set2, type = 'probability')[, 2]) 
> # ROC FOR TESTING SET
> print(roc1 <- roc(set2$BAD, pred1$prob))

Call:
roc.default(response = set2$BAD, predictor = pred1$prob)

Data: pred1$prob in 905 controls (set2$BAD 0) < 95 cases (set2$BAD 1).
Area under the curve: 0.6794
> 
> # BUILD A LOGISTIC REGRESSION
> mdl2 <- Logistic(factor(BAD) ~., data = set1)
> print(mdl2)
Logistic Regression with ridge parameter of 1.0E-8
Coefficients...
               Class
Variable           0
====================
AGE           0.0112
ACADMOS      -0.0005
ADEPCNT      -0.0747
MAJORDRG     -0.2312
MINORDRG     -0.1991
OWNRENT       0.2244
INCOME        0.0004
SELFEMPL     -0.1206
INCPER             0
EXP_INC       0.4472
Intercept     0.7965


Odds Ratios...
               Class
Variable           0
====================
AGE           1.0113
ACADMOS       0.9995
ADEPCNT        0.928
MAJORDRG      0.7936
MINORDRG      0.8195
OWNRENT       1.2516
INCOME        1.0004
SELFEMPL      0.8864
INCPER             1
EXP_INC       1.5639

> pred2 <- data.frame(prob = predict(mdl2, newdata = set2, type = 'probability')[, 2])  
> # ROC FOR TESTING SET
> print(roc2 <- roc(set2$BAD, pred2$prob))

Call:
roc.default(response = set2$BAD, predictor = pred2$prob)

Data: pred2$prob in 905 controls (set2$BAD 0) < 95 cases (set2$BAD 1).
Area under the curve: 0.6529
> 
> # COMPARE TWO ROCS
> roc.test(roc1, roc2)

	DeLong's test for two correlated ROC curves

data:  roc1 and roc2 
Z = 1.0344, p-value = 0.301
alternative hypothesis: true difference in AUC is not equal to 0 
sample estimates:
AUC of roc1 AUC of roc2 
  0.6793894   0.6528875

To leave a comment for the author, please follow the link and comment on their blog: Yet Another Blog in Statistical Computing » S+/R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.