[This article was first published on S+/R – Yet Another Blog in Statistical Computing, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Below is an example showing how to fit a Generalized Linear Model with H2O in R. The output is much more comprehensive than the one generated by the generic R glm().
> library(h2o)
> h2o.init(max_mem_size = "12g")
> df1 <- h2o.uploadFile("Documents/credit_count.txt", header = TRUE, sep = ",", parse_type = "CSV")
> df2 <- h2o.assign(df1[df1$CARDHLDR == 1, ], "glm_df")
> h2o.colnames(df2)
[1] "CARDHLDR" "DEFAULT" "AGE" "ACADMOS" "ADEPCNT" "MAJORDRG"
[7] "MINORDRG" "OWNRENT" "INCOME" "SELFEMPL" "INCPER" "EXP_INC"
[13] "SPENDING" "LOGSPEND"
> Y <- "DEFAULT"
> X <- c("MAJORDRG", "MINORDRG", "INCOME", "OWNRENT")
> dist <- "binomial"
> link <- "logit"
> id <- "h2o_mdl01"
> mdl <- h2o.glm(X, Y, training_frame = h2o.getFrame("glm_df"), model_id = id, family = dist, link = link, lambda = 0, compute_p_values = TRUE, standardize = FALSE)
> show(h2o.getModel(id)@model$coefficients_table)
Coefficients: glm coefficients
names coefficients std_error z_value p_value
1 Intercept -1.204439 0.090811 -13.263121 0.000000
2 MAJORDRG 0.203135 0.069250 2.933370 0.003353
3 MINORDRG 0.202727 0.047971 4.226014 0.000024
4 OWNRENT -0.201223 0.071619 -2.809636 0.004960
5 INCOME -0.000442 0.000040 -10.942350 0.000000
> h2o.performance(h2o.getModel(id))
H2OBinomialMetrics: glm
** Reported on training data. **
MSE: 0.08414496
RMSE: 0.2900775
LogLoss: 0.3036585
Mean Per-Class Error: 0.410972
AUC: 0.6432189
Gini: 0.2864378
R^2: 0.02005004
Residual Deviance: 6376.221
AIC: 6386.221
Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
0 1 Error Rate
0 7703 1800 0.189414 =1800/9503
1 630 366 0.632530 =630/996
Totals 8333 2166 0.231451 =2430/10499
Maximum Metrics: Maximum metrics at their respective thresholds
metric threshold value idx
1 max f1 0.126755 0.231499 142
2 max f2 0.075073 0.376556 272
3 max f0point5 0.138125 0.191828 115
4 max accuracy 0.368431 0.905039 0
5 max precision 0.314224 0.250000 3
6 max recall 0.006115 1.000000 399
7 max specificity 0.368431 0.999895 0
8 max absolute_mcc 0.126755 0.128940 142
9 max min_per_class_accuracy 0.106204 0.604546 196
10 max mean_per_class_accuracy 0.103730 0.605663 202
To leave a comment for the author, please follow the link and comment on their blog: S+/R – Yet Another Blog in Statistical Computing.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
