gRaphics! 2012-05-12 14:07:00
[This article was first published on gRaphics!, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
My own version of bubble plot (part 1)
During one of my projects, I found myself in need of visualizing more than 3 dimensions at once. Three-dimensional graphs are not a good solution, usually – they will need to be properly oriented, for a start, ad that’s tricky.
So, I started looking at bubble plots. The size of the bubble can show one property, as illustrated by the nice post at FlowingData – then you can show one more property defined by a color scale (continuous below, but nothing stops it from being categorical)
I decided to push it and have two properties: look at the example below – the redder the color, the higher the value on the property ApKUpt (or whatever you want). The greener, the higher ApVUpt. I moved the color legend to a square on the extreme right to achieve a better use of the available space.
I tried three colors but it turns out that it just doesn’t work. Even when your eyes don’t interpret every rgb triplet as a completely different color, the amount of redness, greeness or blueness is difficult to estimate. Also, it gets tricky to show the color grading in a legend… One has to resort to slices of the three-dimensional color space. See what I mean?
Of course, one can define an ad-hoc color scale, such as the one used below, vaguely inspired by the colors that Mathematica uses to paint its surfaces. Many thanks to my colleague Pär for teaching me how to define these kind of color scales, and much else.
Here follows the code for the one, two and three colors plot:This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
# bubble plot - takes in a dataframe with five different properties, for X, Y, Size, ColorA, B and C, plus additional parameters to control appearances of the plots produced bubbles_in_2colors<-function( dataset, to_jitter=F, prop_names=NULL, titles=list(main='X vs Y', color='colors', size='size'), min_size = 1, max_size = 5, max_sat = 200, fav_col='green', alpha_default=0.7, ...) { # NB: this code can handle a third primary color, but human eyes can't so for the moment I've forbidden that option. if(exists('dataset')) { if (is.null(prop_names)) {print(prop_names<-names(dataset))}; if (length(prop_names) ==4) {propX<-dataset[,1]; propY<-dataset[,2]; propS<-dataset[,3]; propCA<-dataset[,4]} else if (length(prop_names) ==5) {propX<-dataset[,1]; propY<-dataset[,2]; propS<-dataset[,3]; propCA<-dataset[,4]; propCB<-dataset[,5]} else if (length(prop_names) ==6) {propX<-dataset[,1]; propY<-dataset[,2]; propS<-dataset[,3]; propCA<-dataset[,4]; propCB<-dataset[,5]; propCC<-dataset[,6]; print("Three colors are too many for measly human eyes")} else {stop(sprintf("I don't know what to do with an array with %d properties",length(prop_names)))} }else {stop('did you actually give me a dataset to work with?')} rangepropX<-range(propX); log_x=log_it(rangepropX); if (!exists('xlab')) {xlab=prop_names[1]} rangepropY<-range(propY); log_y=log_it(rangepropY); if (!exists('ylab')) {ylab=prop_names[2]} rangepropS<-range(propS); log_s=log_it(rangepropS); # the line below restricts the sizes in case the span of values isn't too spread out. Max2min<- rangepropS[2]/rangepropS[1]; if ((!log_s) & (Max2min<5)) {min_size=2; max_size=min_size*Max2min} log_xy=paste(if(log_x){'x'}else{''},if(log_y){'y'}else{''},sep=""); rangepropCA<-range(propCA); log_cA=log_it(rangepropCA); if (exists('propCB')) {rangepropCB<-range(propCB); log_cB=log_it(rangepropCB);} else{rangepropCB<-c(0,0); log_cB=F} if (exists('propCC')) {rangepropCC<-range(propCC); log_cC=log_it(rangepropCC);} else{rangepropCC<-c(0,0); log_cC=F} pch <- if (!exists('pch')) {21} else {pch} # 21 for circles, 22 for squares, 23 for diamonds, 24 for triangles up, 25 for triangles down # the color palette is obtained by spreading the known palette (log)proportionally across the range of propC # how about setting the color directly in rgb space? if (!exists('palette_l')) {palette_l<-50} colorpalette<-0:palette_l*(max_sat/255)/palette_l print(colorpalette) if(fav_col=='red') { r_colpaltt<- colorpalette[1+invprop(propCA,rangepropCA[1],rangepropCA[2],log_cA)*(palette_l-1)] g_colpaltt<- if (exists('propCB')) colorpalette[1+invprop(propCB,rangepropCB[1],rangepropCB[2],log_cB)*(palette_l-1)] else 0 b_colpaltt<- if (exists('propCC')) colorpalette[1+invprop(propCC,rangepropCC[1],rangepropCC[2],log_cC)*(palette_l-1)] else 0 }else if(fav_col=='green') { g_colpaltt<- colorpalette[1+invprop(propCA,rangepropCA[1],rangepropCA[2],log_cA)*(palette_l-1)] r_colpaltt<- if (exists('propCB')) colorpalette[1+invprop(propCB,rangepropCB[1],rangepropCB[2],log_cB)*(palette_l-1)] else 0 b_colpaltt<- if (exists('propCC')) colorpalette[1+invprop(propCC,rangepropCC[1],rangepropCC[2],log_cC)*(palette_l-1)] else 0 }else if(fav_col=='blue') { b_colpaltt<- colorpalette[1+invprop(propCA,rangepropCA[1],rangepropCA[2],log_cA)*(palette_l-1)] g_colpaltt<- if (exists('propCB')) colorpalette[1+invprop(propCB,rangepropCB[1],rangepropCB[2],log_cB)*(palette_l-1)] else 0 r_colpaltt<- if (exists('propCC')) colorpalette[1+invprop(propCC,rangepropCC[1],rangepropCC[2],log_cC)*(palette_l-1)] else 0 } colpaltt<- rgb(r_colpaltt, g_colpaltt, b_colpaltt, alpha_default) # print(colpaltt) if (!length(colpaltt[!is.na(colpaltt)])) colpaltt[is.na(colpaltt)]<-'black' # the size palette is obtained by (log)proportional representation of the propC range in the min_size-max_size range. # minimum and maximum for printing symbols 0.5 to 1, or given by the user. sizepaltt<-(prop(invprop(propS,rangepropS[1],rangepropS[2],log_s),min_size, max_size, log_s)) if (!length(sizepaltt[!is.na(sizepaltt)])) sizepaltt[is.na(sizepaltt)]<-1 # print("defined palettes"); print(colpaltt); print(sizepaltt) if (to_jitter) { jittered_X <- jitter_logalong(propX); jittered_Y <- jitter_logalong(propY); }else{ jittered_X <- (propX); jittered_Y <- (propY); } # print("jittered X and Y") # layout(matrix(c(1,1,1,2,1,1,1,3), 2, 4, byrow = TRUE)) # this works but let's try something fancier if(!exists('propCB')) {layout(matrix(c(2,2,1,3), 2, 2, byrow = TRUE), widths=c(3,1), heights=c(1,3))} else if(!exists('propCC')) {layout(matrix(c(1,2,1,3), 2, 2, byrow = TRUE), widths=c(3,1), heights=c(2,2))} else {layout(matrix(c(1,2,1,3,1,4,1,5), 4, 2, byrow = TRUE), widths=c(4,1), heights=c(1,1,1,1))} # print(sprintf("X from %G to %G, Y from %G to %G",rangepropX[1],rangepropX[2],rangepropY[1],rangepropY[2])) plot(NULL, xlim=rangepropX, ylim=rangepropY, log=log_xy, bty='n', xaxt='n', yaxt='n', main=titles$main, xlab=prop_names[1], ylab=prop_names[2]); # , xaxs="i" to plot exactly within xlim, # grid(nx = NULL, ny = NULL, col = "darkgray", lty = "dotted", lwd = 1, equilogs = T) my_grid(nx = NULL, ny = NULL, xlims=rangepropX, ylims=rangepropY, col = "darkgray", lty = "dotted", lwd = 1, equilogs = T) axXticks<-sort(unique(propX)); if (length(axXticks) > 3) { if(log_it(axXticks)) { axXticks<-within(log_pretty(axXticks, 1),axXticks,F, T) }else{ axXticks<-within(pretty(axXticks),axXticks,F, T) } } axis(1, at=axXticks, labels=sprintf("%5.3g",axXticks)) axYticks<-sort(unique(propY)); if (length(axYticks) > 3) { if(log_it(axYticks)) { axYticks<-within(log_pretty(axYticks, 1),axYticks,F, T) }else{ axYticks<-within(pretty(axYticks),axYticks,F, T) } } axis(2, at=axYticks, labels=sprintf("%5.3g",axYticks)) # Not yet working - attempt to reorder points so that the small ones are plotted above the little ones # for (n in 1:length(propS)) {indx<-c(indx,which(propS==sort(propS)[n]))} # indx<-rev(indx) # jittered_X<-jittered_X[indx] # jittered_Y<-jittered_Y[indx] # propS<-propS[indx] # propC<-propC[indx] points(jittered_X, jittered_Y, , pch=21, cex=sizepaltt, col="white", bg=colpaltt); if (length(propX) < 10) {text(x=jittered_X, y=jittered_Y, labels=dots_names, pos=4)} # now it's the time to print a nice colorspace as legend for the bubbles # the colorbar is a second plot, going at the top right (see layout(matrix()) command, much above) # I'll have two different colorspaces depending on whether two or three dimensions are being used as colors if(!exists('propCB')) { plot(NULL, xlim=range(propCA), ylim=c(0,1), log=if(log_cA) 'x' else '', xaxt='n', yaxt='n', xlab=prop_names[4], ylab=prop_names[5], bty='n', main=titles$color); for (n in 1:palette_l) { if (fav_col == 'red') rgbcol<-rgb(colorpalette[n], 0, 0, alpha_default) else if (fav_col == 'green') rgbcol<-rgb(0, colorpalette[n], 0, alpha_default) else if (fav_col == 'blue') rgbcol<-rgb(0, 0, colorpalette[n], alpha_default) rect(xleft= prop(x=(n-1)/(palette_l),rangepropCA[1],rangepropCA[2],log_cA) ,xright= prop(x=(n)/(palette_l),rangepropCA[1],rangepropCA[2],log_cA) ,ytop= 1 ,ybottom= 0 , col=rgbcol , border="transparent" ) } ycolorbarlegend<- 0.5 colorticks<- if (log_cA) within(log_pretty(rangepropCA, 1),rangepropCA, F, F) else within(pretty(rangepropCA),rangepropCA, F) axis(1, at=colorticks, labels=sprintf("%5.3g",colorticks)) }else if(!exists('propCC')) { plot(NULL, xlim=range(propCA), ylim=range(propCB), log=paste(if(log_cA) 'x' else '',if(log_cB) 'y' else '', sep=''), xaxt='n', yaxt='n', xlab=prop_names[4], ylab=prop_names[5], bty='n', main=titles$color); for (m in 1:palette_l) { for (n in 1:palette_l) { rgbcol<-rgb(colorpalette[n],colorpalette[m],0, alpha_default); rect(xleft= prop(x=(n-1)/(palette_l),rangepropCA[1],rangepropCA[2],log_cA) ,xright= prop(x=(n)/(palette_l),rangepropCA[1],rangepropCA[2],log_cA) ,ytop= prop(x=(m-1)/(palette_l),rangepropCB[1],rangepropCB[2],log_cB) ,ybottom= prop(x=(m)/(palette_l),rangepropCB[1],rangepropCB[2],log_cB) ,col=rgbcol ,border="transparent" ) } } colorAticks<- if (log_cA) within(log_pretty(rangepropCA, 1),rangepropCA, F, T) else within(pretty(rangepropCA),rangepropCA, F, T) axis(1, at=colorAticks, labels=sprintf("%5.3g",colorAticks)) colorBticks<- if (log_cB) within(log_pretty(rangepropCB, 1),rangepropCB, F, T) else within(pretty(rangepropCB),rangepropCB, F, T) axis(2, at=colorBticks, labels=sprintf("%5.3g",colorBticks)) }else{ for (o in c(1,palette_l/2, palette_l)) { plot(NULL, xlim=range(propCA), ylim=range(propCB), log=paste(if(log_cA) 'x' else '',if(log_cB) 'y' else '', sep=''), xaxt='n', yaxt='n', xlab=prop_names[4], ylab=prop_names[5], bty='n', main=sprintf("ColC=%g percent",100*o/palette_l)); for (m in 1:palette_l) { for (n in 1:palette_l) { rgbcol<-rgb(colorpalette[n],colorpalette[m],colorpalette[o], alpha_default) rect(xleft= prop(x=(n-1)/(palette_l),rangepropCA[1],rangepropCA[2],log_cA) ,xright= prop(x=(n)/(palette_l),rangepropCA[1],rangepropCA[2],log_cA) ,ytop= prop(x=(m-1)/(palette_l),rangepropCB[1],rangepropCB[2],log_cB) ,ybottom= prop(x=(m)/(palette_l),rangepropCB[1],rangepropCB[2],log_cB) ,col=rgbcol ,border="transparent" ) } } # flush.console(); Sys.sleep(0.5) } colorAticks<- if (log_cA) within(log_pretty(rangepropCA, 1),rangepropCA, F, T) else within(pretty(rangepropCA),rangepropCA, F, T) axis(1, at=colorAticks, labels=sprintf("%5.3g",colorAticks)) colorBticks<- if (log_cB) within(log_pretty(rangepropCB, 1),rangepropCB, F, T) else within(pretty(rangepropCB),rangepropCB, F, T) axis(2, at=colorBticks, labels=sprintf("%5.3g",colorBticks)) } # the bubble legend is a third plot, going in the lower right. plot(NULL, xlim=widen(c(-1,1), 0.9), ylim=widen(rangepropS, 0.9), log=if(log_s) 'y' else '', xaxt='n', yaxt='n', xlab='', ylab='', bty='n', main=titles$size); circles <- if (log_s) within(log_pretty(rangepropS, 1), rangepropS, strict=F, wantrng=F) else within(pretty(rangepropS), rangepropS, strict=F, wantrng=F) #all of them # but I only select five at the most: circles<-rev(circles[ceiling((1:5)*length(circles)/5)]) # select 5 circles for the legend. # I may want to get rid of one of the the last two, in case their values are too close # if (log_s) {if(circles[length(circles)]/circles[length(circles)-1] < 25) {circles<-circles[1:(length(circles)-1)]}} # else {if((circles[length(circles)]-circles[length(circles)-1]) < 0.7(circles[length(circles)-1]-circles[length(circles)-2])) {circles<-circles[1:(length(circles)-1)]}} # xlegend<-2 points(rep(-1,length(circles)), circles, pch=21, bg='grey', col='white', cex=prop(invprop(circles, min(circles), max(circles)),min(sizepaltt),max(sizepaltt))) text(rep(0,length(circles)),circles, labels=sprintf("%5.3g",circles)) }
It’s messy and not at all clean – but it gets the job done. This routine is also dependent from several others which define colorscale and other accessory functions… feel free to drop me a line in the comments if you want the lot… Similar plots can be obtained with ggplot2 in much fewer lines, although right now I’m less expert at it so they’re much less customised.
To leave a comment for the author, please follow the link and comment on their blog: gRaphics!.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.