[This article was first published on R snippets, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Recently I have read a post on Comparing all quantiles of two distributions simultaneously on R-bloggers. In the post author plots two conditional density plots on one graph. I often use such a plot to visualize conditional densities of scores in binary prediction. After several times I had a problem with appropriate scaling of the plot to make both densities always fit into the plotting region I have written a small snippet that handles it.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Here is the code of the function. It scales both x and y axes appropriately:
# class: binary explained variable< o:p>
# score: score obtained from prediction model< o:p>
# main, xlab, col, lty, lwd: passed to plot function< o:p>
# lx, ly: passed to legend function as x and y< o:p>
cdp <- function(class, score,< o:p>
main = “Conditional density”, xlab = “score”,< o:p>
col = c(2, 4), lty = c(1, 1), lwd = c(1, 1),< o:p>
lx = “topleft”, ly = NULL) {< o:p>
class <- factor(class)< o:p>
if (length(levels(class)) != 2) {< o:p>
stop(“class must have two levels”)< o:p>
}< o:p>
if (!is.numeric(score)) {< o:p>
stop(“score must be numeric”)< o:p>
}< o:p>
cscore <- split(score, class)< o:p>
cdensity <- lapply(cscore, density)< o:p>
xlim <- range(cdensity[[1]]$x, cdensity[[2]]$x)< o:p>
ylim <- range(cdensity[[1]]$y, cdensity[[2]]$y)< o:p>
plot(cdensity[[1]], main = main, xlab = xlab, col = col[1],< o:p>
lty = lty[1], lwd = lwd[1], xlim = xlim, ylim = ylim)< o:p>
lines(cdensity[[2]], col = col[2], lty = lty[2], lwd = lwd[2])< o:p>
legend(lx, ly, names(cdensity),< o:p>
lty = lty, col = col, lwd = lwd)< o:p>
}< o:p>
As an example of its application I compare its results to standard cdplot on a simple classification problem:
data(Participation, package = “Ecdat”)< o:p>
data.set <- Participation< o:p>
data.set$age2 <- data.set$age ^ 2< o:p>
glm.model <- glm(lfp ~ ., data = data.set, family=binomial(link = probit))< o:p>
par(mfrow = c(1, 2))< o:p>
cdp(data.set$lfp, predict(glm.model), main = “cdp”)< o:p>
cdplot(factor(data.set$lfp) ~ predict(glm.model),< o:p>
main = “cdplot”, xlab = “score”, ylab = “lfp”)< o:p>
Here is the resulting plot:
To leave a comment for the author, please follow the link and comment on their blog: R snippets.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.