More Explorations with catR
[This article was first published on Econometrics by Simulation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
# For the purposes of simulating computerized adaptive tests # the R package catR is unparallelled. # catR is an excellent tool for students who are curious about # how a computerized adaptive test might work. It is also useful # for testing companies that are interested in seeing how # their choices of number of items, or model, stopping rule, # or quite a few of the other options which are available # when designing a specific computerized adaptive test. # In this post I will explore some of the features of the # function randomCAT, an extremely powerful function # that simulates an entire response pattern for an individual. # In a previous post I explore some of the other function # in catR in order to step by step demonstrate how to use # the package to simulate a test. library("catR") # First let's generate an item bank. # Items specifies how many items to generate # Model specifies which model to use in generating the items # a,b,c Priors are specifying distributions to draw # the parameters from for each item. # The final set of arguments is for specifying # what range of theta values the bank will initially # draw item parameters for. Theta values are the typical # latent traits for which item response theory is concerned # with estimating. Bank <- createItemBank(items = 500, model = "3PL", aPrior=c("norm",1,0.2), bPrior=c("norm",0,1), cPrior=c("unif",0,0.25), thMin = -4, thMax = 4, step = 0.05) # We may want to examine the object we have created called "Bank" attributes(Bank) # Within the Bank object of class "itBank" there is three named # attributes. # itemPar lists the item parameters for those items which have been # generated. We could see a histogram of difficulty parameters (b) by # targeting within the Bank object: hist(Bank$itemPar[,2], breaks=30, main="Distribution of Item Difficulties", xlab="b parameter") # We can also see how much information a particular item would add # accross a range of ability levels. This information is already # available within the Bank object under the names infoTab and # theta. # Plot the first item's information plot(rep(Bank$theta,1),Bank$infoTab[,1], type="l", main="Item 1's information", xlab="Ability (theta)", ylab="Information") # Plot the first 3 items # By specifying type = "n" this plot is left empty nitems = 3 plot(rep(Bank$theta,nitems),Bank$infoTab[,1:nitems], type="n", main=paste0("First ",nitems," items' information"), xlab="Ability (theta)", ylab="Information") # Now we plot the for (i in 1:nitems) lines(Bank$theta,Bank$infoTab[,i], col=grey(.8*i/nitems)) # We can see how different items can have information that # spans different ability estimates as well as some items # which just have more information than other items. # Plotting all 500 items (same code as previously but now # we specify the number of items as 500) nitems = 500 plot(rep(Bank$theta,nitems),Bank$infoTab[,1:nitems], type="n", main=paste0("First ",nitems," items' information"), xlab="Ability (theta)", ylab="Information") for (i in 1:nitems) lines(Bank$theta,Bank$infoTab[,i], col=grey(.8*i/nitems)) # This plot may look nonsensical at first. Be it actually # provides some useful information. From it you can see the # maximum amount of information available for any one # item at different levels of ability. In the places where # there is only one very tall item standing out we may be # concerned about item exposure since subjects which seem to # be in the area of that item are disproportionately more likely # to get the same high info item than other other subjects # in which the next highest item is very close in information # to the max item. # To see the max information for each ability we can add a line. lines(Bank$theta,apply(Bank$infoTab, 1, max), col="blue", lwd=2) # We might also be interested in seeing how much information # on average a random item chosen from the bank would provide # or in other words what is the expected information from a # random item drawn from the bank at different ability levels. lines(Bank$theta,apply(Bank$infoTab, 1, mean), col="red", lwd=2) # Or perhaps we might want to see what the maximum average information # for a 20 item test might be. So we calculate the average information # for the top 20 items at different ability levels. maxmean <- function(x, length=20) mean(sort(x, decreasing=T)[1:length]) maxmean(1:100) # Returns 90.5, seems to be working properly lines(Bank$theta,apply(Bank$infoTab, 1, maxmean), col="orange", lwd=3) # Now this last line is very interesting because it reflects # per item the maximum amount of information this bank can provide # given a fixed length of 20. Multiply this curve by 20 and it will give # us the maximum information this bank can provide given a 20 item test # and a subject's ability. # This can really be thought of as a theoretical maximum for which # any particular CAT test might attempt to meet but on average will # always fall short. # We can add a lengend legend(-4.2, .55, c("max item info", "mean(info)", "mean(top items)"), lty = 1, col = c("blue","red","orange"), adj = c(0, 0.6)) library("reshape") library("ggplot2") # Let's seperate info tab infoTab <- Bank$infoTab # Let's add three columns to info tab for max, mean, and mean(top 20) infoTab <- cbind(infoTab, apply(Bank$infoTab, 1, max), apply(Bank$infoTab, 1, mean), apply(Bank$infoTab, 1, maxmean)) # Melt will turn the item information array into a long object items.long <- melt(infoTab) # Let's assign values to the first column which are thetas items.long[,1] <- Bank$theta # Now we are ready to name the different columns created by melt names(items.long) <- c("theta", "item", "info") <- factor("Item", c("Item","Max", "Mean", "Mean(Max)")) items.long <- cbind(items.long, type=) items.long[items.long$item==501,4] <- "Max" items.long[items.long$item==502,4] <- "Mean" items.long[items.long$item==503,4] <- "Mean(Max)" # Now we are ready to start plotting # Assign the data to a ggplot object a <- ggplot(items.long, aes(x=theta, y=info, group=item)) # Plot a particular instance of the object a + geom_line(colour = gray(.2)) + geom_line(aes(colour = type), size=2 , subset = .(type %in% c("Max", "Mean", "Mean(Max)"))) # Now let's look at how the randomCAT function works. # There are a number of arguments that the randomCAt function # can take. They can be defined as lists which are fed # into the function. # I will specify only that the stoping rule is 20 items. # By specifying true Theta that is telling random CAT what the # true ability level we are estimating. res <- randomCAT(trueTheta = 3, itemBank = Bank, test=list(method = "ML"), stop = list(rule = "length", thr = 20)) # I specify test (theta estimator) as using ML because the # default which is Bayesian model is strongly centrally # biased in this case. # Let's examine what elements are contained with the object "res" attributes(res) # We can see our example response pattern. thetaEst <- c(0, res$thetaProv) plot(1:21, thetaEst, type="n", xlab="Item Number", ylab="Ability Estimate", main="Sample Random Response Pattern") # Add true ability line abline(h=3, col="red", lwd=2, lty=2) # Add a line connecting responses lines(1:21, thetaEst, type="l", col=grey(.8)) # Add the response pattern to text(1:21, thetaEst, c(res$pattern, "X")) # Add the legend legend(15,1,"True Ability", col="red", lty=2, lwd=2) # Plot the sample item information from the set of items selected. plot(rep(Bank$theta,20),Bank$infoTab[,res$testItems], type="n", main="High information items are often selected", xlab="Ability (theta)", ylab="Information") for (i in 1:500) lines(Bank$theta,Bank$infoTab[,i], col=grey(.75)) # Now we plot the for (i in res$testItems) lines(Bank$theta,Bank$infoTab[,i], lwd=2, col=grey(.2)) # Now let's see how randomCat performs with a random draw # of 150 people with different ability estimates. npers <- 150 # Specify number of people to simulate theta <- rnorm(npers) # Draw a theta ability level vector thetaest <- numeric(npers) # Creates an empty vector of zeros to hold future estimates # of theta # Create an empty item object items.used <- NULL # Create an empty object to hold b values for items used b.values <- NULL for (i in 1:npers) { # Input the particular theta[i] ability for a particular run. res <- randomCAT(trueTheta = theta[i], itemBank = Bank, test=list(method = "ML"), stop = list(rule = "length", thr = 20)) # Save theta final estimates thetaest[i] <- res$thFinal # Save a list of items selected in each row of items.used items.used <- rbind(items.used, res$testItems) # Save a list of b values of items selected in each row b.values <- rbind(b.values, res$itemPar[,2]) } # Let's see how our estimated theta's compare with our true plot(theta, thetaest, main="Ability plotted against ability estimates", ylab="theta estimate")
# To get a sense of how much exposure our items get itemTab <- table(items.used) length(itemTab) # We can see we only have 92 items used for all 150 subjects # taking the cat exam. mean(itemTab) # On average each item used is exposed 32 times which means mean(itemTab)/150 # over a 20% exposure rate on average in addition to some items # have much higher exposure rates.
To leave a comment for the author, please follow the link and comment on their blog: Econometrics by Simulation.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.