Split violin plots
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Violin plots are useful for comparing distributions. When data are grouped by a factor with two levels (e.g. males and females), you can split the violins in half to see the difference between groups. Consider a 2 x 2 factorial experiment: treatments A and B are crossed with groups 1 and 2, with N=1000.
1 2 3 4 5 6 7 8 9 | # Simulate data n.each <- 1000 A1 <- rnorm(n.each, 2, 1) A2 <- rnorm(n.each, 1.5, 2) B1 <- rnorm(n.each, 4, 1.5) B2 <- rnorm(n.each, 0, 1) values <- c(A1, A2, B1, B2) treatment <- rep(c("A", "B"), each=n.each*2) group <- rep(c(1, 2, 1, 2), each=n.each) |
Boxplots are often used:
1 2 | par(bty="n") boxplot(values ~ group*treatment, main="Box plot", col=rep(c("purple", "lightblue"), 2)) |
This gives us a rough comparison of the distribution in each group, but sometimes it’s nice to visualize the kernel density estimates instead.
I recently ran into this issue and tweaked the vioplot() function from
the vioplot
package by Daniel Adler to make split violin plots.
With vioplot2(), the side
argument specifies whether to plot the density on “both”, the “left”, or
the “right” side.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | require(vioplot) require(devtools) require(digest) source_gist("https://gist.github.com/mbjoseph/5852613") plot(x=NULL, y=NULL, xlim = c(0.5, 2.5), ylim=c(min(values), max(values)), type="n", ann=FALSE, axes=F) axis(1, at=c(1, 2), labels=c("A", "B")) axis(2) for (i in unique(treatment)) { for (j in unique(group)){ vioplot2(values[which(treatment == i & group == j)], at = ifelse(i == "A", 1, 2), side = ifelse(j == 1, "left", "right"), col = ifelse(j == 1, "purple", "lightblue"), add = T) } } title("Violin plot", xlab="Treatment") legend("bottomright", fill = c("purple", "lightblue"), legend = c("Group 1", "Group 2"), box.lty=0) |
Last but not least, Peter Kampstra’s beanplot package uses beanplot() to make split density plots, but 1) plots a rug rather than a quantile box, 2) includes a line for the overall mean or median, and 3) makes it easier to change the kernel function.
1 2 3 4 5 6 7 8 9 | require(beanplot) beanplot(values ~ group*treatment, ll = 0.04, main = "Bean plot", side = "both", xlab="Treatment", col = list("purple", c("lightblue", "black")), axes=F) axis(1, at=c(1, 2), labels=c("A", "B")) axis(2) legend("bottomright", fill = c("purple", "lightblue"), legend = c("Group 1", "Group 2"), box.lty=0) |
There are more ways than one to skin a cat, and what one uses will probably come to personal preference.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.