[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I recently made three posts regarding analysis of ordinal data. A post looking at all methods I could find in R, a post with an additional method and a post using JAGS. Common in all three was using the cheese data, a data set where only product data was available, observers were not there. The real world is different, there are observers and they are quite different. So, in this post I have added observers and look at two models again; Simple ANOVA and, complex, an ordinal model with random observers.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Data
To spare myself looking for data, and keeping in mind I may want to run some simulations, I have created a data generator. In this generator I entered some of the things I could imagine regarding the observers. First of all they all have different sensitivities. In other words, what is very salt for some, is not at all salt for others. This is expressed in an additive and a multiplicative effect. Second, they have observation error. The same product does not always give the same response. Third is the categories, observers use them slightly different. I am absolutely aware these are much more properties than I might estimate. After all, a typical consumer takes two or three products in a test, while I have many more parameters. Finally, the outer limits (categories 1 and 9) are placed a bit outward. This is because there is an aversion to using these categories.num2scale <- function(x,scale) {
cut(x,c(-Inf,scale,Inf),1:(length(scale)+1))
}
pop.limits2ind.limits <- function(scale,sd=.5) {
newscale <- scale+rnorm(length(scale),sd=sd)
newscale[order(newscale)]
}
obs.score <- function(obs,pop.score,pop.limits,scale.sd=0.5,
sensitivity.sd=.1, precision.sd=1,
additive.sd=2,center.scale=5,
labels=LETTERS[1:length(pop.score)]) {
# individual sensitivity (multiplicative)
obs.errorfreeintensity <- center.scale +
(pop.score-center.scale)*rlnorm(1,sd=sensitivity.sd)
#individual (additive)
obs.errorfreeintensity <- obs.errorfreeintensity +
rnorm(1,mean=0,sd=additive.sd)
# individual observation error
obs.intensity <- obs.errorfreeintensity+
rnorm(length(pop.score),sd=precision.sd)
# individual cut offs between categories
obs.limits <- pop.limits2ind.limits(pop.limits)
obs.score <- num2scale(obs.intensity,obs.limits)
data.frame(obs=obs,
score = obs.score,
product=labels
)
}
panel.score <- function(n=100,pop.score,pop.limits,scale.sd=0.5,
sensitivity.sd=.1,precision.sd=1,
additive.sd=2,center.scale=5,
labels=LETTERS[1:length(pop.score)]) {
la <- lapply(1:n,function(x) {
obs.score(x,pop.score,pop.limits,scale.sd,sensitivity.sd,
precision.sd,labels=labels)
})
dc <- do.call(rbind,la)
dc$obs <- factor(dc$obs)
dc
}
pop.limits <- c(1,3:8,10)
prodmeans <- c(7,7,8)
scores <- panel.score(40,prodmeans,pop.limits)
scores$numresponse <- as.numeric(levels(scores$score))[scores$score]
Analysis – ANOVA
The first analysis method is ANOVA. Plain ANOVA. Not because I dislike variance components or such, but because this is what is done most commonly. On top of that, I cannot imagine somebody taking these data, pulling it through a mixed model, while forgetting it is not from a continuous scale.library(multcomp)
Res.aov <- aov( numresponse ~ obs + product , data=scores)
anova(Res.aov)
Analysis of Variance Table
Response: numresponse
Df Sum Sq Mean Sq F value Pr(>F)
obs 39 239.300 6.1359 6.3686 2.321e-12 ***
product 2 9.517 4.7583 4.9388 0.00956 **
Residuals 78 75.150 0.9635
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
mt <- model.tables(Res.aov,type=’means’,cter)
data.frame(dimnames(mt$tables$product)[1],
Response=as.numeric(mt$tables$product),
LSD=cld(glht(Res.aov,linfct=mcp(product=’Tukey’))
)$mcletters$monospacedLetters
)
product Response LSD
A A 6.725 a
B B 6.850 a
C C 7.375 b
Analysis – cumulative link mixed model (clmm)
The alternative, is the other extreme. Do it as correct as available, using both a cumulative link and making observers random. This model is standard available in the ordinal package. This means two intermediate models are not shown. Both of these models lack the simplicity of ANOVA while being less correct than clmm, hence inferior to both clmm and ANOVA.
library(ordinal)
Res.clmm <- clmm(score ~ product + (1|obs),data=scores)
summary(Res.clmm)
Cumulative Link Mixed Model fitted with the Laplace approximation
formula: score ~ product + (1 | obs)
data: scores
link threshold nobs logLik AIC niter max.grad cond.H
logit flexible 120 -179.57 379.14 37(5046) 8.77e-06 Inf
Random effects:
Var Std.Dev
obs 8.425 2.903
Number of groups: obs 40
Coefficients:
Estimate Std. Error z value Pr(>|z|)
productB 0.1309 0.4225 0.310 0.75680
productC 1.4640 0.4718 3.103 0.00192 **
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
Threshold coefficients:
Estimate Std. Error z value
2|3 -6.0078 NA NA
3|4 -5.6340 NA NA
4|5 -4.1059 0.4528 -9.069
5|6 -3.0521 0.4769 -6.400
6|7 -1.0248 0.4970 -2.062
7|8 0.5499 0.5303 1.037
8|9 4.0583 0.7429 5.463
anova(Res.clmm2,Res.clmm)
Likelihood ratio tests of cumulative link models:
formula: link: threshold:
Res.clmm2 score ~ 1 + (1 | obs) logit flexible
Res.clmm score ~ product + (1 | obs) logit flexible
no.par AIC logLik LR.stat df Pr(>Chisq)
Res.clmm2 8 387.41 -185.70
Res.clmm 10 379.14 -179.57 12.266 2 0.00217 **
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
vc <- vcov(Res.clmm)[c 1=”/><span” 2=”style="background-color:” 3=”#f3f3f3;” 4=”-family:” 5=”Courier” 6=”New,” 7=”Courier,” 8=”monospace;” 9=”-size:” 10=”x-small;"> ” 11=” ” 12=”c('7|8','productB','productC')” language=”('7|8','productB','productC'),</span><br”][/c]
names(co) <- levels(scores$product)
sd <- sqrt(c(vc[1,1],diag(vc)[-1]+vc[1,1]-2*vc[1,-1]))
data.frame(
`top 2 box`=arm::invlogit(c(-co[1],-co[1]+co[-1])),
`2.5% limit`=arm::invlogit(c(-co[1],-co[1]+co[-1])+qnorm(.025)*sd),
`97.5% limit`=arm::invlogit(c(-co[1],-co[1]+co[-1])+qnorm(.975)*sd),
check.names=FALSE
)
top 2 box 2.5% limit 97.5% limit
A 0.3658831 0.1694873 0.6199713
B 0.3967401 0.1753347 0.6704337
C 0.7138311 0.4498042 0.8838691
library(ordinal)
Res.clmm <- clmm(score ~ product + (1|obs),data=scores)
summary(Res.clmm)
Cumulative Link Mixed Model fitted with the Laplace approximation
formula: score ~ product + (1 | obs)
data: scores
link threshold nobs logLik AIC niter max.grad cond.H
logit flexible 120 -179.57 379.14 37(5046) 8.77e-06 Inf
Random effects:
Var Std.Dev
obs 8.425 2.903
Number of groups: obs 40
Coefficients:
Estimate Std. Error z value Pr(>|z|)
productB 0.1309 0.4225 0.310 0.75680
productC 1.4640 0.4718 3.103 0.00192 **
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
Threshold coefficients:
Estimate Std. Error z value
2|3 -6.0078 NA NA
3|4 -5.6340 NA NA
4|5 -4.1059 0.4528 -9.069
5|6 -3.0521 0.4769 -6.400
6|7 -1.0248 0.4970 -2.062
7|8 0.5499 0.5303 1.037
8|9 4.0583 0.7429 5.463
Res.clmm2 <- clmm(score ~ 1 + (1|obs),data=scores)
anova(Res.clmm2,Res.clmm)
Likelihood ratio tests of cumulative link models:
formula: link: threshold:
Res.clmm2 score ~ 1 + (1 | obs) logit flexible
Res.clmm score ~ product + (1 | obs) logit flexible
no.par AIC logLik LR.stat df Pr(>Chisq)
Res.clmm2 8 387.41 -185.70
Res.clmm 10 379.14 -179.57 12.266 2 0.00217 **
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
co <- coef(Res.clmm)[c language=”('7|8','productB','productC')”][/c]
vc <- vcov(Res.clmm)[c 1=”/><span” 2=”style="background-color:” 3=”#f3f3f3;” 4=”-family:” 5=”Courier” 6=”New,” 7=”Courier,” 8=”monospace;” 9=”-size:” 10=”x-small;"> ” 11=” ” 12=”c('7|8','productB','productC')” language=”('7|8','productB','productC'),</span><br”][/c]
names(co) <- levels(scores$product)
sd <- sqrt(c(vc[1,1],diag(vc)[-1]+vc[1,1]-2*vc[1,-1]))
data.frame(
`top 2 box`=arm::invlogit(c(-co[1],-co[1]+co[-1])),
`2.5% limit`=arm::invlogit(c(-co[1],-co[1]+co[-1])+qnorm(.025)*sd),
`97.5% limit`=arm::invlogit(c(-co[1],-co[1]+co[-1])+qnorm(.975)*sd),
check.names=FALSE
)
top 2 box 2.5% limit 97.5% limit
A 0.3658831 0.1694873 0.6199713
B 0.3967401 0.1753347 0.6704337
C 0.7138311 0.4498042 0.8838691
To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.