Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Unfortunately this was not taught in any of my statistics or data analysis classes at university (wtf it so needs to be :scream_cat:). So it took me some until I learned that the AUC has a nice probabilistic meaning.
What’s AUC anyway?
Consider:
- A dataset
: , where is a vector of features collected for the th subject, is the th subject’s label (binary outcome variable of interest, like a disease status, class membership, or whatever binary label).
- A classification algorithm (like logistic regression, SVM, deep neural net, or whatever you like), trained on
, that assigns a score (or probability) to any new observation signifying how likely its label is .
Then:
- A decision threshold (or operating point) can be chosen to assign a class label (
or ) to based on the value of . The chosen threshold determines the balance between how many false positives and false negatives will result from this classification. - Plotting the true positive rate (TPR) against the false positive rate (FPR) as the operating point changes from its minimum to its maximum value yields the receiver operating characteristic (ROC) curve. Check the confusion matrix if you are not sure what TPR and FPR refer to.
- The area under the ROC curve, or AUC, is used as a measure of classifier performance.
Here is some R code for clarification (not even using tidyverse
:stuck_out_tongue:):
# load some data, fit a logistic regression classifier data(iris) versicolor_virginica <- iris[iris$Species != "setosa", ] logistic_reg_fit <- glm(Species ~ Sepal.Width + Sepal.Length, data = versicolor_virginica, family = "binomial") y <- ifelse(versicolor_virginica$Species == "versicolor", 0, 1) y_pred <- logistic_reg_fit$fitted.values # get TPR and FPR at different values of the decision threshold threshold <- seq(0, 1, length = 100) FPR <- sapply(threshold, function(thresh) { sum(y_pred >= thresh & y != 1) / sum(y != 1) }) TPR <- sapply(threshold, function(thresh) { sum(y_pred >= thresh & y == 1) / sum(y == 1) }) # plot an ROC curve plot(FPR, TPR) lines(FPR, TPR)
A rather ugly ROC curve emerges:
The area under the ROC curve, or AUC, seem like a nice heuristic to evaluate and compare the overall performance of classification models independent of the exact decision threshold chosen. But there’s more to it.
Probabilistic interpretation
As above, assume that we are looking at a dataset where we want to distinguish data points of type 0 from those of type 1. Consider a classification algorithm that assigns to a random observation
The ROC curve simply plots
So, given a randomly chosen observation
In other words, if the classification algorithm distinguishes “positive” and “negative” examples (e.g., disease status), then
AUC is the probability of correct ranking of a random “positive”-“negative” pair.
Computing AUC
The above probabilistic interpretation suggest a simple formula to compute AUC on a finite sample.
Among all “positive”-“negative” pairs in the dataset compute the proportion of those which are ranked correctly by the evaluated classification algorithm.
Here is an inefficient implementation using results from the above logistic regression example:
s <- 0 for (i in which(y == 1)) { for (j in which(y == 0)) { if (y_pred[i] > y_pred[j]) { s <- s + 1 } else if (y_pred[i] == y_pred[j]) { s <- s + 0.5 } } } s <- s / (sum(y == 1) * sum(y == 0)) s # [1] 0.7918
The proportion of correctly ranked “positive”-“negative” pairs yields estimated
We can compare this value to the area under the ROC curve computed with the trapezoidal rule.
s <- 0 for (i in 1:(length(FPR) - 1)) { dFPR <- abs(FPR[i+1] - FPR[i]) s <- s + 0.5 * dFPR * (TPR[i+1] + TPR[i]) } s # [1] 0.7922
Trapezoidal rule yields estimated
Since there is a minor disagreement, let’s use some standard R package to compute AUC.
library(ROCR) pred <- prediction(y_pred, y) auc <- as.numeric(performance(pred, measure = "auc")@y.values) auc # [1] 0.7918
Same as the proportion of correctly ranked pairs! 😀
So what? Why care about AUC anyway?
- It has a fu*** nice probabilistic meaning!
Besides, as a measure of classification performance AUC has many advantages compared to other “single number” performance measures:
- Independence of the decision threshold.
- Invariance to prior class probabilities or class prevalence in the data.
- Can choose/change a decision threshold based on cost-benefit analysis after model training.
- Extensively used in machine learning, and in medical research – and that for good reasons, as for example explained in an excellent blog post on deep learning research in medicine by Luke Oakden-Rayner.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.