Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Please note that an earlier version of this post had to be retracted because it contained some content which was generated at work. I have since chosen to rewrite the document in a series of posts. Please recognize that this may take some time. Apologies for any inconvenience.
Logistic regression is used to analyze the relationship between a dichotomous dependent variable and one or more categorical or continuous independent variables. It specifies the likelihood of the response variable as a function of various predictors. The model expressed as
The models estimates,
Using the GermanCredit dataset in the Caret package, we will construct a logistic regression model to estimate the likelihood of a consumer being a good loan applicant based on a number of predictor variables.
library(caret) data(GermanCredit) Train <- createDataPartition(GermanCredit$Class, p=0.6, list=FALSE) training <- GermanCredit[ Train, ] testing <- GermanCredit[ -Train, ] mod_fit_one <- glm(Class ~ Age + ForeignWorker + Property.RealEstate + Housing.Own + CreditHistory.Critical, data=training, family="binomial") summary(mod_fit_one) # estimates exp(coef(mod_fit$finalModel)) # odds ratios predict(mod_fit_one, newdata=testing, type="response") # predicted probabilities
Great, we’re all done, right? Not just yet. There are some critical questions that still remain. Is the model any good? How well does the model fit the data? Which predictors are most important? Are the predictions accurate? In the next few posts, I’ll provide an overview of how to evaluate logistic regression models in R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.