[This article was first published on Colman Statistics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The AUC* or concordance statistic c is the most commonly used measure for diagnostic accuracy of quantitative tests. It is a discrimination measure which tells us how well we can classify patients in two groups: those with and those without the outcome of interest. Since the measure is based on ranks, it is not sensitive to systematic errors in the calibration of the quantitative tests.
It is very well known that a test with no better accuracy than chance has an AUC of 0.5, and a test with perfect accuracy has an AUC of 1. But what is the exact interpretation of an AUC of for example 0.88? Did you know that the AUC is completely equivalent with the Mann-Whitney U test statistic?
*AUC: the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Example
Around 27% of the patients with liver cirrhosis will develop Hepatocellular Carcinoma (HCC) within 5 years of follow-up. With our biomarker “peakA” we would like to predict which patients will develop HCC, and which won’t. We will assess the diagnostic accuracy of biomarker “peakA” using the AUC. To keep things visually clear, we suppose we have a dataset of only 12 patients. Four patients did develop HCC (the “cases”) and 8 didn’t (the “controls”). (fictive data)HCC | Biomarker_value |
---|---|
0 | 1.063 |
1 | 1.132 |
1 | 1.122 |
1 | 1.058 |
0 | 0.988 |
0 | 1.182 |
0 | 1.037 |
0 | 1.052 |
0 | 0.925 |
1 | 1.232 |
0 | 0.911 |
0 | 0.967 |
The ROC curve
Now let’s verify that the AUC is indeed equal to 0.875 in the classical way, by plotting a ROC curve and calculating the estimated AUC using the ROCR package. The ROC curve plots the False Positive Rate (FPR) on the X-axis and the True Postive Rate (TPR) on the Y-axis for all possible thresholds (or cutoff values).- True Positive Rate (TPR) or sensitivity: the proportion of actual positives that are correctly identified as such.
- True Negative Rate (TNR) or specificiy: the proportion of actual negatives that are correctly identified as such.
- False Positive Rate (FPR) or 1-specificity: the proportion of actual negatives that are wrongly identified as positives.
library(ROCR) pred <- prediction(df$Biomarker_value, df$HCC ) perf <- performance(pred,"tpr","fpr") plot(perf,col="black") abline(a=0, b=1, col="#8AB63F")
auc<- performance( pred, c("auc")) unlist(slot(auc , "y.values")) [1] 0.875The estimated AUC based on this ROC curve is indeed equal to 0.875, the proportion of pairs for which the value of “PeakA” is larger for HCC compared to NoHCC.
Relation to the cutoff points of the biomarker
Visualizing the sensitivity and specificity as a function of the cutoff points of the biomarker results in a plot that is at least as informative as a ROC curve and (in my opinion) easier to interpret. The plot can be created using the ROCR package.library(ROCR) testy <- performance(pred,"tpr","fpr")Using the str() function, we see that the following slots are part of the testy object:
- alpha.values: Cutoff
- x.values: Specificity or True Negative Rate
- y.values: Sensitivity or True Positive Rate
plot(testy@alpha.values[[1]], testy@x.values[[1]], type='n', xlab='Cutoff points of the biomarker', ylab='sensitivity or specificity') lines(testy@alpha.values[[1]], testy@y.values[[1]], type='s', col="#1A425C", lwd=2) lines(testy@alpha.values[[1]], 1-testy@x.values[[1]], type='s', col="#8AB63F", lwd=2) legend(1.15,.85, c('sensitivity', 'specificity'), lty=c(1,1), col=c("#1A425C", "#8AB63F"), cex=.9, bty='n')
Mann-Whitney U test statistic
The Mann-Whitney U test statistic (or Wilcoxon or Kruskall-Wallis test statistic) is equivalent to the AUC (Mason, 2002). The AUC can be calculated from the output of the wilcox.test() function:wt <-wilcox.test(data=df, df$Biomarker_value ~ df$HCC) 1 - wt$statistic/(sum(df$HCC==1)*sum(df$HCC==0)) W 0.875The p-value of the Mann-Whitney U test can thus safely be used to test whether the AUC differs significantly from 0.5 (AUC of an uninformative test).
wt <-wilcox.test(data=df, df$Biomarker_value ~ df$HCC) wt$p.value [1] 0.04848485
Simulation: the completely uninformative test.
Now, let’s have a look how our plots look like if our biomarker is not informative at all. Data creation:#simulation of the data set.seed(12345) HCC <- rbinom (n=12, size=1, prob=0.27) Biomarker_value <- rnorm (12,mean=1,sd=0.1) + HCC*0 # replacing the zero by a value would make the test informative df<-data.frame (HCC, Biomarker_value) library(knitr) kable (head(df))
HCC | Biomarker_value |
---|---|
0 | 1.0630099 |
1 | 0.9723816 |
1 | 0.9715840 |
1 | 0.9080678 |
0 | 0.9883752 |
0 | 1.1817312 |
newdf<- expand.grid (Biomarker_value [df$HCC==0],Biomarker_value [df$HCC==1]) colnames(newdf)<- c("NoHCC", "HCC") newdf$Pair <- seq(1,dim(newdf)[1])For each pair the values of the biomarker are compared between case and control:
newdf$Comparison <- 1*(newdf$HCC>newdf$NoHCC) mean(newdf$Comparison) [1] 0.40625 newdf$Comparison<-factor(newdf$Comparison, labels=c("HCC>NoHCC","HCC<=NoHCC")) library (knitr) kable(head(newdf,4))
NoHCC | HCC | Pair | Comparison |
---|---|---|---|
1.0630099 | 0.9723816 | 1 | HCC>NoHCC |
0.9883752 | 0.9723816 | 2 | HCC>NoHCC |
1.1817312 | 0.9723816 | 3 | HCC>NoHCC |
1.0370628 | 0.9723816 | 4 | HCC>NoHCC |
library(data.table) longdf = melt(newdf, id.vars = c("Pair", "Comparison"), variable.name = "Group", measure.vars = c("HCC", "NoHCC")) lab<-paste("AUC = Proportion \n of green lines \nAUC=", round(table(newdf$Comparison)[2]/sum(table(newdf$Comparison)),3)) library(ggplot2) fav.col=c("#1A425C", "#8AB63F") ggplot(longdf, aes(x=Group, y=value))+geom_line(aes(group=Pair, col=Comparison)) + scale_color_manual(values=fav.col)+theme_bw() + ylab("Biomarker value") + geom_text(x=0.75,y=0.95,label=lab) + geom_point(shape=21, size=2) + theme(legend.title=element_blank(), legend.position="bottom")
library(ROCR) pred <- prediction(df$Biomarker_value, df$HCC ) perf <- performance(pred,"tpr","fpr") plot(perf,col="black") abline(a=0, b=1, col="#8AB63F")
auc<- performance( pred, c("auc")) unlist(slot(auc , "y.values")) [1] 0.40625Sensitivity and specificity as a function of the cutoff points of the biomarker:
library(ROCR) plot(testy@alpha.values[[1]], testy@x.values[[1]], type='n', xlab='Cutoff points of the biomarker', ylab='sensitivity or specificity') lines(testy@alpha.values[[1]], testy@y.values[[1]], type='s', col="#1A425C", lwd=2) lines(testy@alpha.values[[1]], 1-testy@x.values[[1]], type='s', col="#8AB63F", lwd=2) legend(1.15,.85, c('sensitivity', 'specificity'), lty=c(1,1), col=c("#1A425C", "#8AB63F"), cex=.9, bty='n')
wt <-wilcox.test(data=df, df$Biomarker_value ~ df$HCC) 1 - wt$statistic/(sum(df$HCC==1)*sum(df$HCC==0)) W 0.40625 wt$p.value [1] 0.6828283
General remarks on the AUC
- Often, a combination of new markers is selected from a large set. This can result in overoptimistic expectations of the marker’s performance. Any performance measure should be estimated with correction for optimism, for example by applying cross-validation or bootstrap resampling. However, validation in fully independent, external data is the best way to validate a new marker.
- When we want to assess the incremental value of a additional marker (e.g. molecular, genetic, imaging) to an existing model, the increase of the AUC can be reported.
References
Mason, S. J. and Graham, N. E. (2002), Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: Statistical significance and interpretation. Q.J.R. Meteorol. Soc., 128: 2145-2166. Steyerberg, Ewout W. et al. “Assessing the Performance of Prediction Models: A Framework for Some Traditional and Novel Measures.” Epidemiology (Cambridge, Mass.) 21.1 (2010): 128-138. Tutorial: https://www.unc.edu/courses/2008fall/ecol/563/001/docs/lectures/lecture19.htmThe post Interpretation of the AUC appeared first on Colman Statistics.
To leave a comment for the author, please follow the link and comment on their blog: Colman Statistics.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.