What it the interpretation of the diagonal for a ROC curve

arthur charpentier

3 years ago

[This article was first published on R-english – Freakonometrics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Last Friday, we discussed the use of ROC curves to describe the goodness of a classifier. I did say that I will post a brief paragraph on the interpretation of the diagonal. If you look around some say that it describes the “strategy of randomly guessing a class“, that it is obtained with “a diagnostic test that is no better than chance level“, even obtained by “making a prediction by tossing of an unbiased coin“.

Let us get back to ROC curves to illustrate those points. Consider a very simple dataset with 10 observations (that is not linearly separable)

x1 = c(.4,.55,.65,.9,.1,.35,.5,.15,.2,.85)
x2 = c(.85,.95,.8,.87,.5,.55,.5,.2,.1,.3)
y = c(1,1,1,1,1,0,0,1,0,0)
df = data.frame(x1=x1,x2=x2,y=as.factor(y))

here we can check that, indeed, it is not separable

plot(x1,x2,col=c("red","blue")[1+y],pch=19)

Consider a logistic regression (the course is on linear models)

reg = glm(y~x1+x2,data=df,family=binomial(link = "logit"))

but any model here can be used… We can use our own function

Y=df$y S=predict(reg) roc.curve=function(s,print=FALSE){
  Ps=(S>=s)*1    
  FP=sum((Ps==1)*(Y==0))/sum(Y==0)      
  TP=sum((Ps==1)*(Y==1))/sum(Y==1)    
if(print==TRUE){      
print(table(Observed=Y,Predicted=Ps))      
}    
vect=c(FP,TP)    
names(vect)=c("FPR","TPR")    
return(vect) }

or any R package actually

library(ROCR) 
perf=performance(prediction(S,Y),"tpr","fpr")

We can plot the two simultaneously here

plot(performance(prediction(S,Y),"tpr","fpr"))
V=Vectorize(roc.curve)(seq(-5,5,length=251)) 
points(V[1,],V[2,])
 segments(0,0,1,1,col="light blue")

So our code works just fine, here. Let us consider various strategies that should lead us to the diagonal.

The first one is : everyone has the same probability (say 50%)

S=rep(.5,10)
 plot(performance(prediction(S,Y),"tpr","fpr")) 
V=Vectorize(roc.curve)(seq(0,1,length=251)) 
points(V[1,],V[2,])

Indeed, we have the diagonal. But to be honest, we have only two points here : \((0,0)\) and \((1,1)\). Claiming that we have a straight line is not very satisfying… Actually, note that we have this situation whatever the probability we choose

S=rep(.2,10) 
plot(performance(prediction(S,Y),"tpr","fpr"))
 V=Vectorize(roc.curve)(seq(0,1,length=251)) 
points(V[1,],V[2,])

We can try another strategy, like “making a prediction by tossing of an unbiased coin“. This is what we obtain

set.seed(1) 
S=sample(0:1,size=10,replace=TRUE)
 plot(performance(prediction(S,Y),"tpr","fpr")) 
V=Vectorize(roc.curve)(seq(0,1,length=251)) 
points(V[1,],V[2,])
 segments(0,0,1,1,col="light blue")

We can also try some sort of “random classifier”, where we choose the score randomly, say uniform on the unit interval

set.seed(1)
 S=runif(10)
 plot(performance(prediction(S,Y),"tpr","fpr")) 
V=Vectorize(roc.curve)(seq(0,1,length=251)) 
points(V[1,],V[2,]) 
segments(0,0,1,1,col="light blue")

Let us try to go further on that one. For convenience, let us consider another function to plot the ROC curve

V=Vectorize(roc.curve)(seq(0,1,length=251))
 roc_curve=Vectorize(function(x) max(V[2,which(V[1,]<=x)]))

We have the same line as previously

x=seq(0,1,by=.025) 
y=roc_curve(x)  
lines(x,y,type="s",col="red")

But now, consider many scoring strategies, all randomly chosen

MY=matrix(NA,500,length(y))
 for(i in 1:500){   
S=runif(10)   
V=Vectorize(roc.curve)(seq(0,1,length=251))   
MY[i,]=roc_curve(x) 
}
 plot(performance(prediction(S,df$y),"tpr","fpr"),col="white") 
for(i in 1:500){   lines(x,MY[i,],col=rgb(0,0,1,.3),type="s") }
 lines(c(0,x),c(0,apply(MY,2,mean)),col="red",type="s",lwd=3) segments(0,0,1,1,col="light blue")

The red line is the average of all random classifiers. It is not a straight line, be we observe oscillations around the diagonal.

Consider a dataset with more observations

 myocarde = read.table("http://freakonometrics.free.fr/myocarde.csv",head=TRUE, sep=";") 
myocarde$PRONO = (myocarde$PRONO=="SURVIE")*1 
reg = glm(PRONO~.,data=myocarde,family=binomial(link = "logit")) 
Y=myocarde$PRONO 
S=predict(reg)
 plot(performance(prediction(S,Y),"tpr","fpr"))
 V=Vectorize(roc.curve)(seq(-5,5,length=251))
 points(V[1,],V[2,])
 segments(0,0,1,1,col="light blue")

Here is a “random classifier” where we draw scores randomly on the unit interval

S=runif(nrow(myocarde)
 plot(performance(prediction(S,Y),"tpr","fpr")) 
V=Vectorize(roc.curve)(seq(-5,5,length=251)) 
points(V[1,],V[2,]) 
segments(0,0,1,1,col="light blue")

And if we do that 500 times, we obtain, on average

MY=matrix(NA,500,length(y))
 for(i in 1:500){   
S=runif(length(Y))   
V=Vectorize(roc.curve)(seq(0,1,length=251))   
MY[i,]=roc_curve(x) } 
plot(performance(prediction(S,Y),"tpr","fpr"),col="white")
 for(i in 1:500){   lines(x,MY[i,],col=rgb(0,0,1,.3),type="s") 
} 
lines(c(0,x),c(0,apply(MY,2,mean)),col="red",type="s",lwd=3) segments(0,0,1,1,col="light blue")

So, it looks like me might say that the diagonal is what we have, on average, when drawing randomly scores on the unit interval…

I did mention that an interesting visual tool could be related to the use of the Kolmogorov Smirnov statistic on classifiers. We can plot the two empirical cumulative distribution functions of the scores, given the response \(Y\)

score=data.frame(yobs=Y,                  ypred=predict(reg,type="response"))
 f0=c(0,sort(score$ypred[score$yobs==0]),1) 
f1=c(0,sort(score$ypred[score$yobs==1]),1)
 plot(f0,(0:(length(f0)-1))/(length(f0)-1),col="red",type="s",lwd=2,xlim=0:1) 
lines(f1,(0:(length(f1)-1))/(length(f1)-1),col="blue",type="s",lwd=2)

we can also look at the distribution of the score, with the histogram (or density estimates)

S=score$ypred 
hist(S[Y==0],col=rgb(1,0,0,.2),      probability=TRUE,breaks=(0:10)/10,border="white") 
hist(S[Y==1],col=rgb(0,0,1,.2),      probability=TRUE,breaks=(0:10)/10,border="white",add=TRUE) 
lines(density(S[Y==0]),col="red",lwd=2,xlim=c(0,1))
 lines(density(S[Y==1]),col="blue",lwd=2)

The underlying idea is the following : we do have a “perfect classifier” (top left corner)

is the supports of the scores do not overlap

otherwise, we should have errors. That the case below

we in 10% of the cases, we might have misclassification

or even more missclassification, with overlapping supports

Now, we have the diagonal

when the two conditional distributions of the scores are identical

Of course, that only valid when \(n\) is very large, otherwise, it is only what we observe on average….

To leave a comment for the author, please follow the link and comment on their blog: R-english – Freakonometrics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.