Fumblings with Ranked Likert Scale Data in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The code is horrible and the visualisations quite possibly misleading, but I’m dead tired and there are a couple of tricks in the following R code that I want to remember, so here’s a contrived bit of fumbling with some data of the form:
enjoyCompany | tooMuchFamily | |
1 | strongly agree | strongly disagree |
2 | strongly agree | strongly disagree |
3 | neither agree nor disagree | strongly disagree |
… | … | … |
That is, N rows, no identifiers, two columns; each column relates to a questionnaire question with a scaled response enumerated as ‘strongly agree’,’agree ‘,’neither agree nor disagree’,’disagree’,’strongly disagree’.
THe first thing I tried to do was some “traditional” Likert scale style stacked bar charts using ggplot2 (surely there must be a Likert scale visualisation library around? If so, how would it work with data in the above (and below) forms? Answers via the comments please…)
require(reshape) require(ggplot2) #My sample data doesn't have row based identifiers, so here's a hacked incremental index based ID fd$a=1 fd$b=cumsum(fd$a) fd=subset(fd,select=c('enjoyCompany','tooMuchFamily','b')) #melt the data into a dataframe with 3 cols: the id col, /b/; a /variable/ column that contains the original column heading; and a /value/ column that contains the original cell value for the corresponding row and column. ff=melt(fd,id.var='b') #Get rid of blank values ff=subset(ff,value!='') #Get rid of unused levels ff$value=factor(ff$value) ##Check: #levels(ff$value) #Reorder the levels into a meaningful order ff$value <- factor(ff$value, levels =rev(c('strongly agree','agree ','neither agree nor disagree','disagree','strongly disagree'))) ggplot(ff)+geom_bar(aes(variable,fill=value))+ coord_flip()
A couple of notable issues with the resulting diagram:
- the colours aren’t that pleasing to look at;
- we have lost all sense of correlation between values. We may like to think that the agree/strongly agree ratings from one question are corrleated with the disagree/strongly disagree responses from the other, but there is nothing in that chart that says this for sure…
However, a pairwise comparison may help…
#Let's count how many times the different scale values occur with each other, and then plot some sort of correlation plot. fs=as.data.frame(table(subset(fd,select=c('enjoyCompany','tooMuchFamily')))) fs=subset(fs,enjoyCompany!='' & tooMuchFamily!='') fs$enjoyCompany <- factor(fs$enjoyCompany, levels =rev(c('strongly agree','agree ','neither agree nor disagree','disagree','strongly disagree'))) fs$tooMuchFamily <- factor(fs$tooMuchFamily, levels =rev(c('strongly agree','agree ','neither agree nor disagree','disagree','strongly disagree'))) ggplot(fs)+geom_point(aes(x=enjoyCompany,y=tooMuchFamily,size=Freq
If I had rather more than two question columns, how would I generate a lattice of pairwise correlation charts to get a visual overview of the how all the question answers interact at the pairwise level?
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.