Plotting cool graphs in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I have to admit to being a bit of a snob when it comes to graphs and charts in scientific papers and presentations. It’s not like I think I am particularly good at it – I’m OK – it’s just that I know what’s bad. I’ve seen folk screenshot multiple Excel graphs so they can paste them into a powerpoint table to create multi-panel plots… and it kind of makes me want to scream. I’m sorry, I really am, but when I see Excel plots in papers I judge the authors, and I don’t mean in a good way. I can’t help it. Plotting good graphs is an art, and sticking with the metaphor, Excel is paint-by-numbers and R is a blank canvas, waiting for something beautiful to be created; Excel is limiting, whereas R sets you free.
Readers of this blog will know that I like to take plots that I find which are fabulous and recreate them. Well let’s do that again
I saw this Tweet by Trevor Branch on Twitter and found it intriguing:
Revising the spaghetti plot: small multiples with gray in the background for the other lines https://t.co/RX2IrEAW0M pic.twitter.com/ayIG1StMDY
— Trevor A. Branch (@TrevorABranch) August 24, 2016
It shows two plots of the same data. The Excel plot:
And the multi plot:
You’re clearly supposed to think the latter is better, and I do; however perhaps disappointingly, the top graph would be easy to plot in Excel but I’m guessing most people would find it impossible to create the bottom one (in Excel or otherwise).
Well, I’m going to show you how to create both, in R. All code now in Github!
The Excel Graph
Now, I’ve shown you how to create Excel-like graphs in R before, and we’ll use some of the same tricks again.
First we set up the data:
# set up the data df <- data.frame(Circulatory=c(32,26,19,16,14,13,11,11), Mental=c(11,11,18,24,23,24,26,23), Musculoskeletal=c(17,18,13,16,12,18,20,26), Cancer=c(10,15,15,14,16,16,14,14)) rownames(df) <- seq(1975,2010,by=5) df
Now let's plot the graph
# set up colours and points cols <- c("darkolivegreen3","darkcyan","mediumpurple2","coral3") pch <- c(17,18,8,15) # we have one point on X axis for each row of df (nrow(df)) # we then add 2.5 to make room for the legend xmax <- nrow(df) + 2.5 # make the borders smaller par(mar=c(3,3,0,0)) # plot an empty graph plot(1:nrow(df), 1:nrow(df), pch="", xlab=NA, ylab=NA, xaxt="n", yaxt="n", ylim=c(0,35), bty="n", xlim=c(1,xmax)) # add horizontal lines for (i in seq(0,35,by=5)) { lines(1:nrow(df), rep(i,nrow(df)), col="grey") } # add points and lines # for each dataset for (i in 1:ncol(df)) { points(1:nrow(df), df[,i], pch=pch[i], col=cols[i], cex=1.5) lines(1:nrow(df), df[,i], col=cols[i], lwd=4) } # add bottom axes axis(side=1, at=1:nrow(df), tick=FALSE, labels=rownames(df)) axis(side=1, at=seq(-0.5,8.5,by=1), tick=TRUE, labels=NA) # add left axis axis(side=2, at=seq(0,35,by=5), tick=TRUE, las=TRUE, labels=paste(seq(0,35,by=5),"%",sep="")) # add legend legend(8.5,25,legend=colnames(df), pch=pch, col=cols, cex=1.5, bty="n", lwd=3, lty=1)
And here is the result:
Not bad eh? Actually, yes, very bad; but also very Excel!
The multi-plot
Plotting multi-panel figures in R is sooooooo easy! Here we go for the alternate multi-plot. We use the same data.
# split into 2 rows and 2 cols split.screen(c(2,2)) # keep track of which screen we are # plotting to scr <- 1 # iterate over columns for (i in 1:ncol(df)) { # select screen screen(scr) # reduce margins par(mar=c(3,2,1,1)) # empty plot plot(1:nrow(df), 1:nrow(df), pch="", xlab=NA, ylab=NA, xaxt="n", yaxt="n", ylim=c(0,35), bty="n") # plot all data in grey for (j in 1:ncol(df)) { lines(1:nrow(df), df[,j], col="grey", lwd=3) } # plot selected in blue lines(1:nrow(df), df[,i], col="blue4", lwd=4) # add blobs points(c(1,nrow(df)), c(df[1,i], df[nrow(df),i]), pch=16, cex=2, col="blue4") # add numbers mtext(df[1,i], side=2, at=df[1,i], las=2) mtext(df[nrow(df),i], side=4, at=df[nrow(df),i], las=2) # add title title(colnames(df)[i]) # add axes if we are one of # the bottom two plots if (scr >= 3) { axis(side=1, at=1:nrow(df), tick=FALSE, labels=rownames(df)) } # next screen scr <- scr + 1 } # close multi-panel image close.screen(all=TRUE)
And here is the result:
And there we have it.
So which do you prefer?
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.