Conditional densities, on one single graph
[This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
With Stéphane Tufféry we’ve been working on credit scoring1 and we’ve been using the popular german credit dataset,
> myVariableNames <- c("checking_status","duration","credit_history", + "purpose","credit_amount","savings","employment","installment_rate", + "personal_status","other_parties","residence_since","property_magnitude", + "age","other_payment_plans","housing","existing_credits","job", + "num_dependents","telephone","foreign_worker","class") > credit = read.table( + "http://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data", + header=FALSE,col.names=myVariableNames) > credit$class <- credit$class-1
We wanted to get a nice code to produce a graph like the one below,
Yesterday, Stéphane came up with the following code, that can easily be adapted
> library(RColorBrewer) > CL=brewer.pal(6, "RdBu") > varQuanti = function(base,y,x) + { + layout(matrix(c(1, 2), 2, 1, byrow = TRUE),heights=c(3, 1)) + par(mar = c(2, 4, 2, 1)) + base0 <- base[base[,y]==0,] + base1 <- base[base[,y]==1,] + xlim1 <- range(c(base0[,x],base1[,x])) + ylim1 <- c(0,max(max(density(base0[,x])$y),max(density(base1[,x])$y))) + plot(density(base0[,x]),main=" ",col=CL[1],ylab=paste("Density of ",x), + xlim = xlim1, ylim = ylim1 ,lwd=2) + par(new = TRUE) + plot(density(base1[,x]),col=CL[6],lty=1,lwd=2, + xlim = xlim1, ylim = ylim1,xlab = '', ylab = '',main=' ') + legend("topright",c(paste(y," = 0"),paste(y," = 1")), + lty=1,col=CL[c language="(1,6)"][/c],lwd=2) + texte <- c("Kruskal-Wallis'Chi² = \n\n", + round(kruskal.test(base[,x]~base[,y])$statistic*1000)/1000) + text(xlim1[2]*0.8, ylim1[2]*0.5, texte,cex=0.75) + boxplot(base[,x]~base[,y],horizontal = TRUE,xlab= y,col=CL[c language="(2,5)"][/c]) +} > varQuanti(credit,"class","duration")
The code is not complex, but since I usually waste a lot of time on my graphs, I will try to upload more frequently short posts, dedicated to graphs, in R (without ggplot).
1.for a chapter on statistical learning in the forthcoming Computational Actuarial Science with R
To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics » R-english.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.