[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
R index is developed in interpreting signal detection data for human perception. In sensory research it is used to interpret ranking data. The value one gets out of an R-index calculation is interpreted as a confusion between samples tested. It has been popularized among others by Bi and O’Mahony.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The value of R-index ranges between 0% and 100%, with 50% representing equality.
The computation of the R index is not difficult and can be done using a few short functions.
Simulated data of ranking for 3 persons and 4 products is:
person prod rank
1 1 1 1
2 1 2 3
3 1 3 2
4 1 4 4
5 2 1 1
6 2 2 4
7 2 3 3
8 2 4 2
9 3 1 1
10 3 2 4
11 3 3 3
12 3 4 2
These data has as R indices:
prod1 prod2 Rindex
1 1 2 100.00000
2 1 3 100.00000
3 1 4 100.00000
4 2 3 11.11111
5 2 4 22.22222
6 3 4 44.44444
It is not obvious at first sight, but the R index is in all actuality not a continuous function. There are only so many permutations of rankings. To demonstrate this, we repeat a simulation 1000 times. With 25 simulated persons with each 4 products the frequency of R-indices is shown in the figure.
It is clear that R index exactly 50% is the most probable outcome. This represents that the products are all equal. It is also clear that some R-indices which are really close to 50 are much less probable. Finally, values under 20 and over 80 hardly occur. This is well know. Critical values of R-indices are given by the red and blue lines (Bi and O’Mahony 1995 respectively 2007, Journal of Sensory Studies)
R code for these results
# A simple Ranking
makeRanksNodiff <- function(nprod,nrep) {
inList <- lapply(1:nrep,function(x) sample(1:nprod,nprod) )
data.frame(person=factor(rep(1:nrep,each=nprod)),
prod=factor(rep(1:nprod,times=nrep)),
rank=unlist(inList))
}
# R index for two products out of a cross table
tab2Rindex <- function(t1,t2) {
Rindex <- crossprod(rev(t1)[-1],cumsum(rev(t2[-1]))) + 0.5*crossprod(t1,t2)
100*Rindex/(sum(t1)*sum(t2))
}
# R index for all products out of an experiment
allRindex <- function(rankExperiment) {
crst <- xtabs(~ prod + rank,data=rankExperiment)
nprod <- nlevels(rankExperiment$prod)
Rindices <- lapply(1:(nprod-1),function(p1) {
lap <- lapply((p1+1):nprod,function(p2) {
index <- tab2Rindex(crst[p1,],crst[p2,])
prod1 <- levels(rankExperiment$prod)[p1]
prod2 <- levels(rankExperiment$prod)[p2]
data.frame(prod1=prod1,prod2=prod2,Rindex=index)
})
return(do.call(rbind,lap))
})
do.call(rbind,Rindices)
}
# small data set
set.seed(12)
rankExperiment <- makeRanksNodiff(nprod=4,nrep=3)
rankExperiment
allRindex(rankExperiment)
# larger simulation
set.seed(12)
last <- lapply(1:1000,function(x) {
re <- makeRanksNodiff(nprod=4,nrep=25)
ar <- allRindex(re)
ar$Rindex
} )
rankmatrix <- do.call(rbind,last)
RindicesTable <- as.data.frame(table(rankmatrix))
RindicesTable$Rindex <- as.numeric(as.character(RindicesTable$rankmatrix))
library(ggplot2)
g1 <- ggplot(RindicesTable, aes( Rindex, Freq)) + geom_point(alpha=.5)
g1 <- g1 + geom_vline(xintercept=50 + 18.57*c(-1,1),colour=’red’)
g1 <- g1 + geom_vline(xintercept=50 + 15.21*c(-1,1),colour=’blue’)
g1
To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.