polar histogram: pretty and useful

[This article was first published on Christophe Ladroue » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Do you have tens of histograms to show but no room to put them all on the page? As I was reading this paper in Nature Genetics, I came across a simple and clever way of packing all this information in a small space: arrange them all around a circle, and add some guides to help their cross-comparison.

It didn’t look too difficult to implement in ggplot2 thanks to polar coordinates and after a busy Saturday afternoon I ended up with the following image with my data (*) (and a poster-ready pdf, after 2 seconds of prettying up with Inkscape):

The graph shows the proportion of some SNP scores (‘first’, ‘second’ and ‘third’) for a number of phenotypes, which are grouped by themes. I’m quite happy with the result. It’s pretty and useful: it’s very easy to compare one histogram with any of the other 60.

The code is still a bit rough around the edges; a few things are not terribly elegant or are hard-coded. An improved version will be shipped with our graphical package next month. In the mean-time, here it is, if you want to try it with your own data.

It returns a ggplot object containing the graph. You can either display it, with print(), save it as a pdf with ggsave(“myPlot.pdf”) or modify it with the usual ggplot2 commands. I’ve called it polar histogram, which, I think, is self-explanatory. If you know how it’s actually called, please let me know. (No, I will not call it polR histogram.)

And here is some fake data to get you going:

Select All Code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# fake data for polarHistogram()
# Christophe Ladroue
library(plyr)
library(ggplot2)
source("polarHistogram.R")
 
# a little helper that generates random names for families and items.
randomName<-function(n=1,syllables=3){
  vowels<-c("a","e","i","o","u","y")
  consonants<-setdiff(letters,vowels)
  replicate(n,
            paste(
              rbind(sample(consonants,syllables,replace=TRUE),
                    sample(vowels,syllables,replace=TRUE)),
              sep='',collapse='')
            )
}
 
  set.seed(42)
 
  nFamily<-20
  nItemPerFamily<-sample(1:6,nFamily,replace=TRUE)
  nValues<-3
 
  df<-data.frame(
    family=rep(randomName(nFamily),nItemPerFamily),
    item=randomName(sum(nItemPerFamily),2))
 
df<-cbind(df,as.data.frame(matrix(runif(nrow(df)*nValues),nrow=nrow(df),ncol=nValues)))
 
 
  df<-melt(df,c("family","item"),variable_name="score") # from wide to long
  p<-polarHistogram(df,familyLabel=FALSE)
  print(p)

Options:
Many defaults can be changed already, look at the code for the complete list. The two things you might want to change are familyLabels (logical) which displays (or not) the name of each group as well, and direction, which is either ‘inwards’ or ‘outwards’.

Coding notes:
It wasn’t terribly difficult but it did take me a bit longer than expected, for a few reasons:

  1. coord_polar() doesn’t affect the orientation of geom_text() so it had to be calculated manually.

  2. You’ll notice that the label orientations change between 6 and 9 o’clock, or they would end up upside down and be difficult to read.
  3. There are some scoping issues with plyr and ggplot2 which can be a bit annoying once you encapsulate your code in a function. For example:

    Select All Code:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    
    df<-data.frame(
      x=runif(10),
      y=runif(10))
     
    z<-10
    ggplot(df)+geom_point(aes(x=x+z,y=y)) # works
     
    rm(z)
    fakeFunction<-function(df){
      z<-10
      ggplot(df)+geom_point(aes(x=x+z,y=y))
      }
     
    fakeFunction(df) # error

Happy plotting!

(*) The numbers are fudged, don’t spend time reverse-engineering them.

Update (24/03/2012):
Christos Hatzis has modified my original code to plot a collection of un-normalised bar charts, like this.

He’s happy to share his code here: PolarBarchart.zip, together with a test file.

Update (02/06/2012):
You can find a better version in my R package ‘phorest‘.

To leave a comment for the author, please follow the link and comment on their blog: Christophe Ladroue » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)