The Double Density Plot Contains a Lot of Useful Information
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The double density plot contains a lot of useful information.
This is a plot that shows the distribution of a continuous model score, conditioned on the binary categorical outcome to be predicted. As with most density plots: the y-axis is an abstract quantity called density picked such that the area of each curve integrates to 1.
An example is given here.
The really cool observation I wanted to share is: if we know this classifier is well calibrated, then we can recover the positive category prevalence from the graph.
A well calibrated probability score is one such that E[outcome == TRUE] = E[prediction]
. For such a classifier we must have for the unknown positive outcome prevalence p
. This is because the following relation holds in this case:
p E[prediction | on positive curve] + (1 - p) E[prediction | on negative curve] = p
This follows as p
and 1-p
are the relative sizes of the positive and negative classes, prior to being re-scaled to integrate to one as part of the density. The conditional expectations E[prediction | on positive curve]
and E[prediction | on negative curve]
are depicted on the double density plot, so from them we can recover the prevalence p
.
The recovery of the prevalence from the two conditional means is shown in the earlier figure.
We have some additional results coming out for what I am currently calling “fully calibrated probability scores.” These are scores such that E[outcome == TRUE | prediction = p] = p
for all p
in the interval [0, 1]
. This includes a very interesting special case where it is easy to show that the prevalence is the probability value where the density curves cross.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.