[This article was first published on Deeply Trivial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Because I can’t share data from our item banks, I’ll generate a fake dataset to use in my demonstration. For the exams I’m using for my upcoming standard setting, I want to draw a large sample of items, stratified by both item difficulty (so that I have a range of items across the Rasch difficulties) and item domain (the topic from the exam outline that is assessed by that item). Let’s pretend I have an exam with 3 domains, and a bank of 600 items. I can generate that data like this:
domain1 <- data.frame(domain = 1, b = sort(rnorm(200))) domain2 <- data.frame(domain = 2, b = sort(rnorm(200))) domain3 <- data.frame(domain = 3, b = sort(rnorm(200)))
The variable domain is the domain label, and b is the item difficulty. I decided to sort that varible within each dataset so I can easily see that it goes across a range of difficulties, both positive and negative.
head(domain1) ## domain b ## 1 1 -2.599194 ## 2 1 -2.130286 ## 3 1 -2.041127 ## 4 1 -1.990036 ## 5 1 -1.811251 ## 6 1 -1.745899 tail(domain1) ## domain b ## 195 1 1.934733 ## 196 1 1.953235 ## 197 1 2.108284 ## 198 1 2.357364 ## 199 1 2.384353 ## 200 1 2.699168
If I desire, I can easily combine these 3 datasets into 1:
item_difficulties <- rbind(domain1, domain2, domain3)
I can also easily visualize my item difficulties, by domain, as a group of histograms using ggplot2:
library(tidyverse) item_difficulties %>% ggplot(aes(b)) + geom_histogram(show.legend = FALSE) + labs(x = "Item Difficulty", y = "Number of Items") + facet_wrap(~domain, ncol = 1, scales = "free") + theme_classic() ## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.