Site icon R-bloggers

Box plot, Fisher’s style

[This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In a recent issue of Significance, I discovered an interesting – and amuzing – figure, about some box & beard plot, in Dr Fisher’s casebook: Beard the statistician in his den.

In French, the box plot (introduced by John Tukey, not George Box, as discussed in a previous post) is popular under the name boîte à moustaches (box with a mustache, for a simple translation).

> set.seed(2)
> x=rnorm(500)
> boxplot(x,horizontal=TRUE,axes=FALSE)
> axis(1)

I was wondering if it was possible to reproduce that Fisher’s style of box plot, with a beard (Ronald Fisher was famous for his beard). Technically, it is not complicated,

> boxplot(x,horizontal=TRUE,xlim=c(-1,1.3),axes=FALSE)
> axis(1)
> Q=quantile(x,c(.25,.75))
> y=cut(x[(x>=Q[1])&(x<=Q[2])],seq(Q[1],Q[2],length=11))
> tb=table(y)
> u=seq(Q[1],Q[2],length=11)
> umid=(u[1:10]+u[2:11])/2
> for(i in 1:10) segments(umid[i],1-.2,umid[i],1-.2-tb[i]/20,lwd=3)

Yes, I kept the mustaches here, but ploting a simple box, one could easily draw a goatee, or a chin strip

> rect(Q[1],1-.2,Q[2],1+.2)
> segments(median(x),1-.2,median(x),1+.2,lwd=2)

The only problem, is that between the first and the third quartiles, the distribution is much more ‘uniform‘ than it might look on the figure in Significance. Unless, perhaps, if add more bars on the left of the first quartile, and on the right of the third one,

> du=diff(umid)[1]
> y=cut(x,seq(Q[1]-du*8,Q[2]+du*8,length=11+16))
> tb=table(y)
> u=seq(Q[1]-du*8,Q[2]+du*8,length=11+16)
> umid=(u[1:26]+u[2:27])/2
> for(i in 1:8) segments(umid[i],1,umid[i],1-.2-tb[i]/20,lwd=3)
> for(i in 19:26) segments(umid[i],1,umid[i],1-.2-tb[i]/20,lwd=3)

Of course, we can try to add a smile on that face, below the mustache,

> vu=seq(-2.5,2.5,by=.02)
> vv=dnorm(vu)
> lines(vu,1-vv*4,col="red",lty=2)
> d=density(x,bw=.1)
> lines(d$x,1-d$y*4,col="red")

So far, I am not convinced. I mean, it should be possible to add something on the box plot,  to add more information. But I still wonder what… maybe using colors?

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics » R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.