GLM with H2O in R
[This article was first published on S+/R – Yet Another Blog in Statistical Computing, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Below is an example showing how to fit a Generalized Linear Model with H2O in R. The output is much more comprehensive than the one generated by the generic R glm().
> library(h2o) > h2o.init(max_mem_size = "12g") > df1 <- h2o.uploadFile("Documents/credit_count.txt", header = TRUE, sep = ",", parse_type = "CSV") > df2 <- h2o.assign(df1[df1$CARDHLDR == 1, ], "glm_df") > h2o.colnames(df2) [1] "CARDHLDR" "DEFAULT" "AGE" "ACADMOS" "ADEPCNT" "MAJORDRG" [7] "MINORDRG" "OWNRENT" "INCOME" "SELFEMPL" "INCPER" "EXP_INC" [13] "SPENDING" "LOGSPEND" > Y <- "DEFAULT" > X <- c("MAJORDRG", "MINORDRG", "INCOME", "OWNRENT") > dist <- "binomial" > link <- "logit" > id <- "h2o_mdl01" > mdl <- h2o.glm(X, Y, training_frame = h2o.getFrame("glm_df"), model_id = id, family = dist, link = link, lambda = 0, compute_p_values = TRUE, standardize = FALSE) > show(h2o.getModel(id)@model$coefficients_table) Coefficients: glm coefficients names coefficients std_error z_value p_value 1 Intercept -1.204439 0.090811 -13.263121 0.000000 2 MAJORDRG 0.203135 0.069250 2.933370 0.003353 3 MINORDRG 0.202727 0.047971 4.226014 0.000024 4 OWNRENT -0.201223 0.071619 -2.809636 0.004960 5 INCOME -0.000442 0.000040 -10.942350 0.000000 > h2o.performance(h2o.getModel(id)) H2OBinomialMetrics: glm ** Reported on training data. ** MSE: 0.08414496 RMSE: 0.2900775 LogLoss: 0.3036585 Mean Per-Class Error: 0.410972 AUC: 0.6432189 Gini: 0.2864378 R^2: 0.02005004 Residual Deviance: 6376.221 AIC: 6386.221 Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold: 0 1 Error Rate 0 7703 1800 0.189414 =1800/9503 1 630 366 0.632530 =630/996 Totals 8333 2166 0.231451 =2430/10499 Maximum Metrics: Maximum metrics at their respective thresholds metric threshold value idx 1 max f1 0.126755 0.231499 142 2 max f2 0.075073 0.376556 272 3 max f0point5 0.138125 0.191828 115 4 max accuracy 0.368431 0.905039 0 5 max precision 0.314224 0.250000 3 6 max recall 0.006115 1.000000 399 7 max specificity 0.368431 0.999895 0 8 max absolute_mcc 0.126755 0.128940 142 9 max min_per_class_accuracy 0.106204 0.604546 196 10 max mean_per_class_accuracy 0.103730 0.605663 202
To leave a comment for the author, please follow the link and comment on their blog: S+/R – Yet Another Blog in Statistical Computing.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.