Plotting Likert Scales
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Graphs can provide an excellent way to emphasize a point and to quickly and efficiently show important information. Sadly, poor graphs can be a good way to waste space in an article, take up time in a presentation, and waste a lot of ink all while providing little to no information.
Excel has made it possible to make all sort of graphs. However, just because the graph looks like a spider web or like something you can eat for dessert doesn’t mean you should use it.
This discussion here will show five options on how to graph Likert scale data, will show best/common practice for graphing, and will provide the R code for each graph. These graphing approaches are based on a list that I have compiled that the different people that I have worked with have used to graph and interpret Likert scales within their organization.
Likert scales usually have 5 or 7 response options. However, the exact number and whether there should be an odd or even number of responses is a topic for another psycometric discussion. A typical Likert scale is:
1 Strongly Agree
2 Agree
3 Neutral
4 Disagree
5 Strongly Disagree
For example purposes I generated some random discrete data that is formatted as a Likert scale. I have created three examples to show the extremities of Likert scale responses.
set.seed(1234) library(e1071) probs <- cbind(c(.4,.2/3,.2/3,.2/3,.4),c(.1/4,.1/4,.9,.1/4,.1/4),c(.2,.2,.2,.2,.2)) my.n <- 100 my.len <- ncol(probs)*my.n raw <- matrix(NA,nrow=my.len,ncol=2) raw <- NULL for(i in 1:ncol(probs)){ raw <- rbind(raw, cbind(i,rdiscrete(my.n,probs=probs[,i],values=1:5))) } r <- data.frame( cbind( as.numeric( row.names( tapply(raw[,2], raw[,1], mean) ) ), tapply(raw[,2], raw[,1], mean), tapply(raw[,2], raw[,1], mean) + sqrt( tapply(raw[,2], raw[,1], var)/tapply(raw[,2], raw[,1], length) ) * qnorm(1-.05/2,0,1), tapply(raw[,2], raw[,1], mean) - sqrt( tapply(raw[,2], raw[,1], var)/tapply(raw[,2], raw[,1], length) ) * qnorm(1-.05/2,0,1) )) names(r) <- c("group","mean","ll","ul") gbar <- tapply(raw[,2], list(raw[,2], raw[,1]), length) sgbar <- data.frame( cbind(c(1:max(unique(raw[,1]))),t(gbar)) ) sgbar.likert<- sgbar[,2:6]
Diverging Stacked Bar Chart
Diverging stacked bar charts are often the best choice when visualizing Likert scale data. There are various ways to produce these graphs but I have found the easiest approach uses the HH package. There are many graphs that can be produced using this package. I have provided three approaches here.
require(grid) require(lattice) require(latticeExtra) require(HH) sgbar.likert<- sgbar[,2:6] likert(sgbar.likert, main='Example Diverging Stacked Bar Chart for Likert Scale', sub="Likert Scale") likert(sgbar.likert, horizontal=FALSE, aspect=1.5, main="Example Diverging Stacked Bar Chart for Likert Scale", auto.key=list(space="right", columns=1, reverse=TRUE, padding.text=2), sub="Likert Scale") likert(sgbar.likert, auto.key=list(between=1, between.columns=2), xlab="Percentage", main="Example Diverging Stacked Bar Chart for Likert Scale", BrewerPaletteName="Blues", sub="Likert Scale")
Mean Value
Often researchers will simply take each response option and interpret it as a real number. Using this approach makes it very convenient to calculate the mean value and standard deviation (and confidence intervals). This is particularly useful when working with non-analytical clients. However, this is a controversial issue. Taking this approach requires a lot of statistical assumptions that may not be correct. For starters the response options really need to be equidistant from each other. For example, is the distance from Strongly Agree to Agree the same distance from Disagree to Strongly Disagree? This may be true for one question but it might not be true for all questions on a questionnaire. Different questions and question wording are quite likely going to have different distributions. Furthermore, confidence intervals require normality assumptions which may also be incorrect.
What makes matters worse is that if there are 50 respondents and 25 of them mark Strongly Disagree and 25 of them mark Strongly Agree then the mean will be 3 implying that, on average, the results are neutral. But that clearly does not adequately describe the data. Ultimately, it is up to the statistician to work with the client to use an appropriate method that appropriately conveys the message and that both parties can agree upon.
plot.new(); par(mfrow=c(1,1)); plot(r$group,r$mean, type="o", cex=1, col="blue", pch=16, ylim=c(1,5), lwd=2, , ylab="Mean Value", xlab="Group" , main=paste("Likert Scale Mean Values Example") , cex.sub=.60 , xaxt = "n", yaxt = "n"); axis(1, at=(1:3), tcl = -0.7, lty = 1, lwd = 0.8, labels=TRUE) axis(2, at=(1:5), labels=TRUE, tcl = -0.7, lty = 1) abline(h=c(1:5), col="grey") lines(r$group,r$ll, col='red', lwd=2) lines(r$group,r$ul, col='red', lwd=2) legend("topright", c("Mean","Confidence Interval"), col=c('blue','red'), title="Legend", lty=1, lwd=2, inset = .05)
Pies and Multiple Pies
A table is nearly always better than a dumb pie chart; the only worse design than a pie chart is several of them, for then the viewer is asked to compare quantities located in spatial disarray both within and between charts (…). Given their low density and failure to order numbers along a visual dimension, pie charts should never be used. — Edward Tufte
Pie charts are notoriously difficult to convey the information that was intended. As far as pie charts go I don’t ever use them. There are far better ways to visualize data. However, I have heard some people give a reason for using them that are somewhat justified and generally are based on the ‘eye candy’ argument. But as far as creating a graph that both provides information and looks good a 3-D pie chart is probably not the best choice. I debated whether I should even include the R code for the example but to provide full disclosure here’s the code.
my.table <- table(raw[,2][raw[,1]==1]) names(my.table) <- c("Strongly Agree","Agree","Neutral","Disagree","Strongly Disagree") labl <- paste(names(my.table), "\n", my.table, sep="") pie(my.table, labels=labl, main="Example Pie Chart of Likert Scale")
plot.new() num.groups <- length(unique(raw[,1])) par(mfrow=c(1,num.groups)) for(j in 1:num.groups){ my.table <- table(raw[,2][raw[,1]==j]) pie(my.table, labels=labl, main=paste("Example Pie Chart of\nLikert Scale Group ", j)) }
library(plotrix) slices <- my.table names(my.table) <- c("Strongly Agree","Agree","Neutral","Disagree","Strongly Disagree") labl <- paste(names(my.table), "\n", my.table, sep="") pie3D(slices,labels=labl,explode=0.1, main="3D Pie Chart Example")
Grouped Bar Chart
This is a nice approach when wanting to look at each group and highlight any particular Likert response option. Here it is easy to see that in Group 2 the Neutral option is by far the most common response.
par(mfrow=c(1,1)) barplot(gbar, beside=T, col=cm.colors(5), main="Example Bar Chart of Counts by Group",xlab="Group",ylab="Frequency") legend("topright", names(my.table), col=cm.colors(5), title="Legend", lty=1, lwd=15, inset = .1)
Divided Bar Chart
This isn’t a bad approach and quite similar to the diverging stacked bar chart. This approach shows the stacked percent for each category.
library(ggplot2) library(reshape2) names(sgbar) <- c("group","Strongly Agree","Agree","Neutral","Disagree","Strongly Disagree") mx <- melt(sgbar, id.vars=1) names(mx) <- c("Group","Category","Percent") ggplot(mx, aes(x=Group, y=Percent, fill=Category)) + geom_bar(stat="identity")
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.