Some Applications of Item Response Theory in R

Joel Cadwell

7 years ago

[This article was first published on Engaging Market Research, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The typical introduction to item response theory (IRT) positions the technique as a form of curve fitting. We believe that a latent continuous variable is responsible for the observed dichotomous or polytomous responses to a set of items (e.g., multiple choice questions on an exam or rating scales from a survey). Literally, once I know your latent score, I can predict your observed responses to all the items. Our task is to estimate that function with one, two or three parameters after determining that the latent trait is unidimensional. In the process of measuring individuals, we gather information about the items. Those one, two or three parameters are assessments of each item’s difficulty, discriminability and sensitivity to noise or guessing.

All this has been translated into R by William Revelle, and as a measurement task, our work is done. We have an estimate of each individual’s latent position on an underlying continuum defined as whatever determines the item responses. Along the way, we discover which items require more of the latent trait in order to achieve a favorable response (e.g., the difficulty of answering correctly or the extremity of the item and/or the response). We can measure ability with achievement items, political ideology with an opinion survey, and brand perceptions with a list of satisfaction ratings.

To be clear, these scales are meant to differentiate among individuals. For example, the R statistical programming language has an underlying structure that orders the learning process so that the more complex concepts are mastered after the simpler material. In this case, learning is shaped by the difficulty of the subject matter with the more demanding content reusing or building onto what has already been learned. When the constraints are sufficient, individuals and their mastery can be arrayed on a common scale. At one end of the continuum are complex concepts that only the more advanced students master. The easier stuff falls toward the bottom of the scale with topics that almost everyone knows. When you take an R programming achievement test, your score tells me how well you performed relative to others who answered similar questions (see normed-referenced testing).

The same reasoning applied to IRT analysis of political ideology (e.g., the R package basicspace). Opinions tend to follow a predictable path from liberal to conservative so that only a limited number of all possible configurations are actually observed. As shown below, legislative voting follows such a pattern with Senators (dark line) and Representatives (light line) separate along the liberal to conservative dimensions based on their votes in the 113th Congress. Although not shown, all the specific votes can also be placed on this same scale so that Pryor, Landrieu, Baucus and Hagan (in blue) are located toward the right because their votes on various bills and resolutions agreed more often with Republicans (in red). As with achievement testing, an order is imposed on the likely responses of objects so that the response space in p dimensions (where p equals the number of behaviors, items or votes) is reduced to a one-dimensional seriation of both votes and voters on the same scale.

My last example comes from marketing research where brand perceptions tend to organized as a pattern of strengths and weaknesses defined by the product category. In a previous post, I showed how preference for Subway fast food restaurants is associated with a specific ordering of product and service attribute ratings. Many believe that Subway offers fresh and healthy food. Fewer like the taste or feel it is filling. Fewer still are happy with the ordering or preparation, and even more dislike the menu and the seating arrangements. These perceptions have an order so that if you are satisfied with the menu then you are likely to be satisfied with the taste and the freshness/healthiness of the food. Just as issues can be ordered from liberal to conservative, brand perceptions reflect the strengths and weaknesses promised by the brand’s positioning. Subway promises fresh and healthy food but not prepackaged and waiting under the heat lamp for easy bagging. The mean levels of our satisfaction ratings will be consistent with those brand priorities.

We can look at the same data from another perspective. Heatmaps summarize the triangular pattern observed in data matrices that can be modeled by IRT. In a second post analyzing the Subway data, I described the following heatmap showing the results from the 8-item checklist of features associated with the brand. Each row is a different respondent with the blue indicating that the item was checked and red telling us that the item was not checked. As one moves down the heatmap, the overall perceptions become more positive as additional attributes are endorsed. Positive brand perceptions are incremental, but the increments are not more of the same. Tasty and filling gets added to healthy and fresh. That is, greater satisfaction with Subway is reflected in the willingness to endorse additional components of the brand promise. The heatmap is triangular so that those who are happy with the menu are likely to be at least as satisfied with all the attributes to the right.

To leave a comment for the author, please follow the link and comment on their blog: Engaging Market Research.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.