Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
Logistic regression is a statistical method used for predicting the probability of a binary outcome. It’s a fundamental tool in machine learning and statistics, often employed in various fields such as healthcare, finance, and marketing. We use logistic regression when we want to understand the relationship between one or more independent variables and a binary outcome, which can be “yes/no,” “1/0,” or any two-class distinction.
< section id="getting-started" class="level1">Getting Started
Before we dive into plotting the logistic regression curve, let’s start with the basics. First, you’ll need some data. For this blog post, I’ll assume you have your dataset ready. If you don’t, you can easily find sample datasets online to practice with.
< section id="load-the-data" class="level1">Load the Data
In R, we use the read.csv
function to load a CSV file into a data frame. For example, if you have a dataset called “mydata.csv,” you can load it like this:
# Load the data into a data frame data <- read.csv("mydata.csv")
We will instead use the following data set:
library(dplyr) set.seed(123) df <- tibble( x = runif(100, 0, 10), y = rbinom(100, 1, 1 / (1 + exp(-1 * (0.5 * x - 2.5)))) ) head(df)
# A tibble: 6 × 2 x y <dbl> <int> 1 2.88 0 2 7.88 1 3 4.09 0 4 8.83 0 5 9.40 1 6 0.456 0
Fit a Logistic Regression Model
Next, we need to fit a logistic regression model to our data. We’ll use the glm
(Generalized Linear Model) function to do this. Suppose we want to predict the probability of a “success” (1) based on a single predictor variable “x.”
# Fit a logistic regression model model <- glm(y ~ x, data = df, family = binomial) broom::glance(model)
# A tibble: 1 × 8 null.deviance df.null logLik AIC BIC deviance df.residual nobs <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int> 1 138. 99 -51.5 107. 112. 103. 98 100
broom::tidy(model)
# A tibble: 2 × 5 term estimate std.error statistic p.value <chr> <dbl> <dbl> <dbl> <dbl> 1 (Intercept) -2.63 0.571 -4.60 0.00000422 2 x 0.505 0.102 4.96 0.000000699
head(broom::augment(model), 1) |> dplyr::glimpse()
Rows: 1 Columns: 8 $ y <int> 0 $ x <dbl> 2.875775 $ .fitted <dbl> -1.175925 $ .resid <dbl> -0.7333581 $ .hat <dbl> 0.01969748 $ .sigma <dbl> 1.028093 $ .cooksd <dbl> 0.003162007 $ .std.resid <dbl> -0.7406892
Predict Probabilities
Now that we have our model, we can use it to predict probabilities. We’ll create a sequence of values for our predictor variable, and for each value, we’ll predict the probability of success, in this case y
.
# Create a sequence of predictor values x_seq <- seq(0, 10, 0.01) # Predict probabilities probabilities <- predict( model, newdata = data.frame(x = x_seq), type = "response" ) head(x_seq)
[1] 0.00 0.01 0.02 0.03 0.04 0.05
head(probabilities)
1 2 3 4 5 6 0.06732923 0.06764710 0.06796636 0.06828702 0.06860908 0.06893255
The predict
function here calculates the probabilities using our logistic regression model.
Plot the Logistic Regression Curve
Finally, let’s plot the logistic regression curve. We’ll use the plot
function to create a scatter plot of the data points, and then we’ll overlay the logistic curve using the lines
function.
# Plot the data points plot( df$x, df$y, pch = 16, col = "blue", xlab = "Predictor Variable", ylab = "Probability of Success" ) # Add the logistic regression curve lines(x_seq, probabilities, col = "red", lwd = 2)
And there you have it! You’ve successfully plotted a logistic regression curve in base R. The blue dots represent your data points, and the red curve is the logistic regression curve, showing how the probability of success changes with the predictor variable.
< section id="conclusion" class="level1">Conclusion
I encourage you to try this out with your own dataset. Logistic regression is a powerful tool for modeling binary outcomes, and visualizing the curve helps you understand the relationship between your predictor variable and the probability of success. Experiment with different datasets and predictor variables to gain a deeper understanding of this essential statistical technique.
Remember, practice makes perfect, and the more you work with logistic regression in R, the more proficient you’ll become. Happy coding!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.