R index between two products is somewhat dependent on other products
[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I explained earlier how R-index is used in sensory is used to examine ranking data. The legitimization to use R-index is in the link with d’ and with Mann-Whitney statistic. In this post I show there is a dependence on the number of products and position of other products on the R index. It is a small effect. However, if data is analyzed by looking only and rigidly at the p value, then the result might chance from just under significant to just over significance.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Using simulations, I will show that presence of other samples influences the R-index. I think this effect occurs because the R index is, mathematically, calculated from an aggregated matrix of counts of product against ranks. It is my feeling, that when there are more products, there are less chances to get equal rankings than with few products and hence slightly different scores.
R index calculation
Below the calculation when comparing 2 products from a total of 4
R index calculation matrix
rank 1 rank 2 rank 3 rank 4
product 1 a b c d
product 2 e f g h
Note a to h are the counts in the respective cells.
The R index is composed of three parts:
1 The number of wins of product 1 over product 2:
a*(f+g+h) + b*(g+h) + c*h
2 The number of equal rankings divided by two
(a*e + b*f + c*g + d*h) /2
3 Normalization
(a+b+c+d)*(e+f+g+h)
R index = 100* wins*equal / normalization
Effect of number of products
Figure 1 shows the simulation R index dependence on the number of products, using a ranking with 25 panelists. With a low number of products, the distribution of the R index is a bit wider than with more products. Most of the difference in distribution is in the region 3 to 6 products, which is also the number of products often used in sensory.
(Critical values of R-indices are given by the red and blue lines (Bi and O’Mahony 1995 respectively 2007, Journal of Sensory Studies))
Effect of neighborhood of other products
Figure 2 shows the dependence on location of the other products. I have chosen 5 products, two have the same location. The other 3 move away from this location. Again 25 panelists. In this figure it shows that the two products R-index has a smaller distribution under H0 (no product differences) when all products are similar. This is about the same as the 5 products in the first plot. When the other products are far away, the distribution becomes wider, getting closer to the 3 product distribution in figure 1.
It should be written that with one product rather than three products moving away from the centre location the effect is smaller. Effect of number of panelists is for a next post.
Code for figure 1:
library(ggplot2)
makeRanksNoDiff <- function(nprod,nrep) {
inList <- lapply(1:nrep,function(x) sample(1:nprod,nprod) )
data.frame(person=factor(rep(1:nrep,each=nprod)),
prod=factor(rep(1:nprod,times=nrep)),
rank=unlist(inList))
}
tab2Rindex <- function(t1,t2) {
Rindex <- crossprod(rev(t1)[-1],cumsum(rev(t2[-1]))) + 0.5*crossprod(t1,t2)
100*Rindex/(sum(t1)*sum(t2))
}
FastAllRindex <- function(rankExperiment) {
crst <- xtabs(~ prod + rank,data=rankExperiment)
nprod <- nlevels(rankExperiment$prod)
Rindices <- unlist( lapply(1:(nprod-1),function(p1) {
lapply((p1+1):nprod,function(p2) tab2Rindex(crst[p1,],crst[p2,])) }) )
Rindices
}
nprod <- seq(3,25,by=1)
last <- lapply(nprod,function(xo) {
nsamples <- ceiling(10000/xo)
li <- lapply(1:nsamples,function(xi) {
re <- makeRanksNoDiff(nprod=xo,nrep=25)
FastAllRindex(re)
})
li2 <- as.data.frame(do.call(rbind,li))
li2$nprod <- xo
li2
} )
last2 <- lapply(last,function(x) {
qq <- quantile(as.matrix(x[,grep('nprod',names(x),invert=TRUE)]) ,c(0.025,.5,.975))
qq <- as.data.frame(t(qq))
qq$nprod <- x$nprod[1]
} )
summy <- do.call(rbind,last2)
g1 <- ggplot(summy,aes(nprod,`50%`) )
g1 <- g1+ geom_errorbar(aes(ymax = `97.5%`, ymin=`2.5%`))
g1 <- g1 + scale_y_continuous(name='R-index' )
g1 <- g1 + scale_x_continuous(name='Number of products to compare')
g1 <- g1 + geom_hline(yintercept=50 + 18.57*c(-1,1),colour='red')
g1 <- g1 + geom_hline(yintercept=50 + 15.21*c(-1,1),colour='blue')
g1
Additional code for figure 2
makeRanksDiff <- function(prods,nrep) {
nprod <- length(prods)
inList <- lapply(1:nrep,function(x) rank(rnorm(n=nprod,mean=prods)))
data.frame(person=factor(rep(1:nrep,each=nprod)),
prod=factor(rep(1:nprod,times=nrep)),
rank=unlist(inList))
}
location <- seq(0,3,by=.25)
last <- lapply(location,function(xo) {
li <- sapply(1:10000,function(xi) {
re <- makeRanksDiff(prod=c(0,0,xo,xo,xo),nrep=25)
crst <- xtabs(~ prod + rank,data=re)
tab2Rindex(crst[1,],crst[2,])
})
li2 <- data.frame(location=xo,Rindex=li)
li2
} )
last2 <- lapply(last,function(x) {
qq <- quantile( x$Rindex,c(0.025,.5,.975))
qq <- as.data.frame(t(qq))
qq$location <- x$location[1]
qq
} )
summy <- do.call(rbind,last2)
g1 <- ggplot(summy,aes(location,`50%`) )
g1 <- g1+ geom_errorbar(aes(ymax = `97.5%`, ymin=`2.5%`))
g1 <- g1 + scale_y_continuous(name='R-index between equal products' )
g1 <- g1 + scale_x_continuous(name='Location of odd products')
g1 <- g1 + geom_hline(yintercept=50 + 18.57*c(-1,1),colour='red')
g1 <- g1 + geom_hline(yintercept=50 + 15.21*c(-1,1),colour='blue')
g1
To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.