[This article was first published on Engaging Market Research, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Heatmaps, when the rows and columns are appropriately ordered, provide insight into the data structure at the individual level. In an earlier post I showed a cluster heatmap with dendrograms for both the rows and the columns. In addition, I provided an example of what a heatmap might look like if the underlying structure were a scalogram or a Guttman scale such as what we would expect to find in item response theory (IRT). Although it is not blood spatter analysis from crime scene investigation, heatmaps can assist in deciding whether the underlying heterogeneity is a continuous (IRT model) or discrete (finite mixture model) latent variable.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
For example, in my last post I generated 200 observations on 8 binary items using a Rasch simulation from the R package psych. As a reminder, we were attempting to simulate the perceptions of hungry airline passengers as they passed by a Subway restaurant on their way to the terminal to board their airplane. Using a checklist, respondents were asked to indicate if the restaurant had good seating and menu selection, timely ordering and food preparation, plus tasty, filling, healthy, and fresh food.
In order to show the underlying pattern of scores, we will need to sort both the rows and the columns by their marginal values. That is, one would calculate the total score across the 8 items for each respondent and sort by these marginal total scores. In addition, one would compute the column means across respondents for each of the 8 items and sort by these marginal item means.
In the above heatmap for our Rasch simulated data, we can see the typical Guttman scale pattern. As one moves from left to right, the item get easier, that is, the column becomes bluer. Similarly, as one travels down the heatmap from the top, we find respondents with increasingly higher scores. Both of these findings are expected given that the rows and columns have been sorted by their marginals. However, what is revealing in the heatmap is the pattern with which the data matrix changes from red to blue. We call this pattern “cumulative” because respondents appear to score higher by adding items to their checklists. Only a few did not check any of the items. Those who checked only one item tended to say that the food was fresh. Healthy, filling and tasty were added next. Only those giving Subway the highest scores marked the first four service items.
The R code is straightforward when you use the heatmap.2 function from the R package gplots. We start with the 200 x 8 data matrix (called ToyData) created in my last post, calculate row and column marginals, and sort the data matrix by the marginals. Then, we call the gplots package and run the heatmap.2 function. As you might imagine, there are a lot of options. Rowv and Colv are set to FALSE so that the current order of the rows and columns will be maintained. There is no dendrogram because we are not clustering the rows and columns. I am using red and blue for the colors. I am adding a color key, but leaving out the row labels.
item<-apply(ToyData,2,mean) person<-apply(ToyData,1,sum) ToyDataOrd<-ToyData[order(person),order(item)] library(gplots) heatmap.2(ToyDataOrd, Rowv=FALSE, Colv=FALSE, dendrogram="none", col=redblue(16), key=T, keysize=1.5, density.info="none", trace="none", labRow=NA)
Why Does One Observe the Guttman Scale Pattern?
We find the Guttman scale pattern whenever there is a strong sequential or cumulative structure to the data (e.g., achievement test scores, physical impairment, cultural evolution, and political ideology). In the case of brand perceptions, we would only expect to see cumulative effects in well-formed product categories where there was universal agreement concerning the strengths and weaknesses of brands in the category.
In order to use an item response model, there must be sufficient constraints so that there is a cumulative pattern underlying the items. If I wanted to buy a hammer, I would need to choose between good, better, and best. The “best” does all the stuff done by the “better” and then some. Product features are cumulative. First class provides all the benefits of second class plus some extras. And the same holds for services. We can talk about meeting or exceeding expectation only because we all understand the cumulative ordering. The consumer knows when they receive only basic service, and they can tell you when they receive more than the minimal required. Again, the effects are cumulative. A successful brand must always provide the basics. They exceed our expectations by doing more, and we can capture that “more” by including additional items in our questionnaire.
To leave a comment for the author, please follow the link and comment on their blog: Engaging Market Research.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.