R Tutorial: Add confidence intervals to dotchart
[This article was first published on Maximize Productivity with Industrial Engineer and Operations Research Tools, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Recently I was working on a data visualization project. I wanted to visualize summary statistics by category of the data. Specifically I wanted to see a simple dispersion of data with confidence intervals for each category of data. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
R is my tool of choice for data visualization. My audience was a general audience so I didn’t want to use boxplots or other density types of visualization methods. I wanted a simple mean and 95% (~ roughly 2 standard deviations) confidence around the mean. My method of choice was to use the dotchart function. Yet that function is limited to showing the data points and not the dispersion of the data. So I needed to layer in the confidence intervals.
The great thing about R is that the functions and objects are pretty much layered. I can create one R object and add to it as I see fit. This is mainly true with most plotting functions in R. I knew that I could use the lines function to add lines to an existing plot. This method worked great for my simplistic plot and adds another tool to my R toolbox.
Here is the example dotchart with confidence intervals R script using the “mtcars” dataset that is provided with any R installation.
x <- data.frame(mean=tapply(mtcars$mpg, list(mtcars$cyl), mean), sd=tapply(mtcars$mpg, list(mtcars$cyl), sd) )
### Add lower and upper levels of confidence intervals
x$LL <- x$mean-2*x$sd
x$UL <- x$mean+2*x$sd
### plot dotchart with confidence intervals
title <- "MPG by Num. of Cylinders with 95% Confidence Intervals"
dotchart(x$mean, col=”blue”, xlim=c(floor(min(x$LL)/10)*10, ceiling(max(x$UL)/10)*10), main=title )
for (i in 1:nrow(x)){
lines(x=c(x$LL[i],x$UL[i]), y=c(i,i))
}
grid()
And here is the example of the finished product.
To leave a comment for the author, please follow the link and comment on their blog: Maximize Productivity with Industrial Engineer and Operations Research Tools.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.