Bayesian ANOVA for sensory panel profiling data
[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In this post it is examined if it is possible to use Bayesian methods and specifically JAGS to analyze sensory profiling data. The aim is not to obtain different results, but rather to confirm that the results are fairly similar. The data used is the chocolate data from SensoMineR and the script is adapted from various online sources examined over a longer period.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Building the model
JAGS, (but also WinBugs and OpenBugs) are programs which can be used to provide samples from posterior distributions. It is up to the user to provide data and model to JAGS and interpret the samples. Luckily, R provides infrastructure to help both in setting up models and data and in posterior analysis of the samples. Still, there are some steps to be done, before the analysis can be executed; Setting up data, defining model, initializing variables and deciding which parameters of the model are interesting. Only then JAGS can be called.Setting up data
Data is any data which goes into JAGS. This includes experimental design, measurements, but also number of rows in the data. For this post I have added some extra data, since I want to compare differences between product means. As I want to compare those, I need to have samples from these specific distributions. All the data needs to go into one big list, which will be given to JAGS later on.
# get libraries
library(SensoMineR)
library(coda)
library(R2jags)
# infrastructure for contrasts
FullContrast <- function(n) {
UseMethod(‘FullContrast’,n)
}
FullContrast.default <- function(n) stop('FullContrast only takes integer and factors')
FullContrast.integer <- function(n) {
mygrid <- expand.grid(v1=1:n,v2=1:n)
mygrid <- mygrid[mygrid$v1
rownames(mygrid) <- 1:nrow(mygrid)
as.matrix(mygrid)
}
FullContrast.factor <- function(n) {
FullContrast(nlevels(n))
}
# reading data
data(chocolates)
# setting up experimental data
data_list <- with(sensochoc,list(Panelist = as.numeric(Panelist),
nPanelist = nlevels(Panelist),
Product = as.numeric(Product),
nProduct = nlevels(Product),
y=CocoaA,
N=nrow(sensochoc)))
# contrasts
data_list$Panelistcontr <- FullContrast(sensochoc$Panelist)
data_list$nPanelistcontr <- nrow(data_list$Panelistcontr)
data_list$Productcontr <- FullContrast(sensochoc$Product)
data_list$nProductcontr <- nrow(data_list$Productcontr)
Model definition
The model can be written in ‘plain’ R and then given to JAGS. However, JAGS does not have vector operations, hence there are a lot of for loops which would be unacceptable for normal R usage. Besides the additive effects in the first part of the model, there are quite some extras. all the means in the model are coming out of hyperdistributions. In this case these are normal distributed. The precision (and hence variance) of these hyperdistributions are determined on basis of the data. The final part of the model translates the internal parameters into something which is sensible to interpret. The model also needs to be written into a file so JAGS can use it later on, these are the last two lines.mymodel <- function() {
# core of the model
for (i in 1:N) {
fit[i] <- grandmean + mPanelist[Panelist[i]] + mProduct[Product[i]] + mPanelistProduct[Panelist[i],Product[i]]
y[i] ~ dnorm(fit[i],tau)
}
# grand mean and residual
tau ~ dgamma(0.001,0.001)
gsd <- sqrt(1/tau)
grandmean ~ dnorm(0,.001)
# variable Panelist distribution
mPanelist[1] <- 0
for (i in 2:nPanelist) {
mPanelist[i] ~ dnorm(offsetPanelist,tauPanelist)
}
offsetPanelist ~ dnorm(0,.001)
tauPanelist ~ dgamma(0.001,0.001)
sdPanelist <- sqrt(1/tauPanelist)
# Product distribution
mProduct[1] <- 0
for (i in 2:nProduct) {
mProduct[i] ~ dnorm(offsetProduct,tauProduct)
}
offsetProduct ~ dnorm(0,0.001)
tauProduct ~ dgamma(0.001,0.001)
sdProduct <- sqrt( 1/tauProduct)
# interaction distribution
for (i in 1:nPanelist) {
mPanelistProduct[i,1] <- 0
}
for (i in 2:nProduct) {
mPanelistProduct[1,i] <- 0
}
for (iPa in 2:nPanelist) {
for (iPr in 2:nProduct) {
mPanelistProduct[iPa,iPr] ~dnorm(offsetPP,tauPP)
}
}
offsetPP ~dnorm(0,0.001)
tauPP ~dgamma(0.001,0.001)
sdPP <- 1/sqrt(tauPP)
# getting the interesting data
# true means for Panelist
for (i in 1:nPanelist) {
meanPanelist[i] <- grandmean + mPanelist[i] + mean(mPanelistProduct[i,1:nProduct]) + mean(mProduct[1:nProduct])
}
# true means for Product
for (i in 1:nProduct) {
meanProduct[i] <- grandmean + mProduct[i] + mean(mPanelistProduct[1:nPanelist,i]) + mean(mPanelist[1:nPanelist])
}
for (i in 1:nPanelistcontr) {
Panelistdiff[i] <- meanPanelist[Panelistcontr[i,1]]-meanPanelist[Panelistcontr[i,2]]
}
for (i in 1:nProductcontr) {
Productdiff[i] <- meanProduct[Productcontr[i,1]]-meanProduct[Productcontr[i,2]]
}
}
model.file <- file.path(tempdir(),'mijnmodel.txt')
write.model(mymodel,con=model.file)
Initializing variables
Anything values in the model which are not provided by the data, needs to be initialized. It is most convenient to setup a little model which can be used to get these values.inits <- function() list(
grandmean = rnorm(1,3,1),
mPanelist = c(0,rnorm(data_list$nPanelist-1)) ,
mProduct = c(0,rnorm(data_list$nProduct-1)) ,
mPanelistProduct = rbind(rep(0,data_list$nProduct),cbind(rep(0,data_list$nPanelist-1),matrix(rnorm((data_list$nPanelist-1)*(data_list$nProduct-1)),nrow=data_list$nPanelist-1,ncol=data_list$nProduct-1)))
,
tau = runif(1,1,2),
tauPanelist = runif(1,1,3),
tauProduct = runif(1,1,3)
)
parameters of interest and calling JAGS
The parameters of interest is basically anything which we want know anything about. The JAGS call, is just listing all the parts provided before to JAGS. In this case, the model runs fairly quick, so I decided to have some extra iterations (n.iter) and an extra chain. For this moment, I decided not to calculate DIC.parameters <- c('sdPanelist','sdProduct','gsd','meanPanelist',
‘meanProduct’,’Productdiff’,’sdPP’)
jagsfit <- jags(data=data_list,inits=inits,model.file=model.file,
parameters.to.save=parameters,n.chains=4,DIC=FALSE,n.iter=10000)
Result
The first part of the result can be obtained via a simple print of jagsfit
jagsfit
Inference for Bugs model at "/tmp/Rtmp0VFW9g/mijnmodel.txt", fit using jags, 4 chains, each with 10000 iterations (first 5000 discarded), n.thin = 5 n.sims = 4000 iterations saved mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff Productdiff[1] 0.562 0.312 -0.037 0.356 0.558 0.773 1.186 1.001 4000 Productdiff[2] 2.303 0.317 1.697 2.087 2.299 2.512 2.932 1.001 4000 Productdiff[3] 1.741 0.314 1.118 1.530 1.748 1.948 2.346 1.001 3700 Productdiff[4] 0.836 0.307 0.224 0.630 0.838 1.044 1.424 1.002 2000 Productdiff[5] 0.274 0.303 -0.322 0.069 0.273 0.478 0.870 1.001 2600 Productdiff[6] -1.467 0.313 -2.081 -1.671 -1.465 -1.257 -0.865 1.001 2700 Productdiff[7] 0.339 0.308 -0.263 0.130 0.338 0.537 0.951 1.001 4000 Productdiff[8] -0.223 0.299 -0.807 -0.430 -0.218 -0.021 0.349 1.001 4000 Productdiff[9] -1.965 0.319 -2.587 -2.183 -1.966 -1.757 -1.322 1.001 4000 Productdiff[10] -0.497 0.294 -1.068 -0.699 -0.495 -0.300 0.082 1.002 1600 Productdiff[11] 0.740 0.317 0.130 0.525 0.733 0.952 1.374 1.001 4000 Productdiff[12] 0.177 0.300 -0.412 -0.019 0.178 0.382 0.753 1.001 4000 Productdiff[13] -1.564 0.322 -2.186 -1.776 -1.570 -1.349 -0.913 1.001 3300 Productdiff[14] -0.097 0.300 -0.686 -0.300 -0.090 0.102 0.505 1.002 2100 Productdiff[15] 0.401 0.300 -0.180 0.197 0.405 0.597 0.997 1.001 4000 gsd 1.689 0.067 1.563 1.644 1.687 1.734 1.821 1.002 1700 meanPanelist[1] 6.667 0.492 5.680 6.334 6.659 7.002 7.601 1.001 3100 meanPanelist[2] 5.513 0.436 4.660 5.219 5.512 5.808 6.369 1.001 4000 meanPanelist[3] 6.325 0.443 5.439 6.031 6.331 6.624 7.171 1.001 4000 meanPanelist[4] 7.394 0.443 6.542 7.094 7.400 7.703 8.262 1.002 1800 meanPanelist[5] 7.104 0.445 6.227 6.807 7.104 7.405 7.982 1.001 4000 meanPanelist[6] 6.201 0.436 5.328 5.912 6.208 6.497 7.034 1.001 4000 meanPanelist[7] 6.386 0.444 5.521 6.078 6.384 6.688 7.248 1.003 1100 meanPanelist[8] 6.451 0.443 5.562 6.160 6.452 6.760 7.301 1.001 4000 meanPanelist[9] 5.184 0.441 4.334 4.882 5.184 5.480 6.050 1.002 2300 meanPanelist[10] 8.110 0.460 7.212 7.815 8.107 8.411 9.020 1.001 3700 meanPanelist[11] 5.122 0.448 4.227 4.811 5.121 5.433 5.989 1.002 2100 meanPanelist[12] 7.191 0.441 6.327 6.891 7.181 7.492 8.061 1.001 2900 meanPanelist[13] 6.719 0.438 5.893 6.421 6.720 7.013 7.563 1.001 4000 meanPanelist[14] 6.266 0.440 5.396 5.978 6.266 6.553 7.138 1.002 1700 meanPanelist[15] 6.859 0.434 6.002 6.571 6.867 7.148 7.716 1.002 2200 meanPanelist[16] 6.079 0.434 5.238 5.783 6.082 6.371 6.930 1.002 2100 meanPanelist[17] 6.054 0.433 5.213 5.769 6.051 6.345 6.893 1.002 1800 meanPanelist[18] 6.122 0.420 5.281 5.844 6.129 6.398 6.939 1.002 2200 meanPanelist[19] 6.185 0.438 5.324 5.888 6.182 6.479 7.064 1.002 1500 meanPanelist[20] 4.988 0.456 4.088 4.680 4.997 5.292 5.873 1.001 4000 meanPanelist[21] 4.994 0.455 4.064 4.689 4.996 5.305 5.884 1.001 3400 meanPanelist[22] 5.062 0.451 4.177 4.763 5.065 5.370 5.944 1.001 2600 meanPanelist[23] 7.171 0.453 6.296 6.855 7.172 7.480 8.057 1.001 4000 meanPanelist[24] 5.663 0.442 4.799 5.368 5.671 5.959 6.532 1.002 2400 meanPanelist[25] 7.514 0.455 6.627 7.210 7.518 7.815 8.425 1.001 4000 meanPanelist[26] 6.320 0.439 5.432 6.023 6.325 6.624 7.156 1.001 2900 meanPanelist[27] 6.853 0.431 6.013 6.562 6.843 7.144 7.705 1.001 4000 meanPanelist[28] 7.045 0.440 6.197 6.741 7.043 7.340 7.926 1.001 4000 meanPanelist[29] 4.789 0.456 3.915 4.472 4.789 5.096 5.687 1.001 4000 meanProduct[1] 7.084 0.223 6.653 6.937 7.081 7.230 7.539 1.001 4000 meanProduct[2] 6.522 0.215 6.097 6.375 6.524 6.669 6.928 1.001 4000 meanProduct[3] 4.781 0.227 4.332 4.626 4.778 4.927 5.227 1.001 4000 meanProduct[4] 6.248 0.213 5.838 6.101 6.248 6.394 6.660 1.002 1500 meanProduct[5] 6.745 0.217 6.317 6.605 6.748 6.885 7.171 1.001 4000 meanProduct[6] 6.344 0.221 5.910 6.199 6.347 6.494 6.772 1.001 4000 sdPP 0.140 0.123 0.024 0.050 0.096 0.193 0.461 1.010 290 sdPanelist 0.996 0.178 0.700 0.869 0.978 1.106 1.393 1.001 4000 sdProduct 0.996 0.536 0.426 0.670 0.864 1.164 2.293 1.001 4000 For each parameter, n.eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
Variables gsd and sdPanelist might be used to examine panel performance, but to examine this better, they should be compared with results from other descriptors.
plot(jagsfit)
Details about products
A main question if obviously, which products are different? For this we can extract some data from a summaryjagsfit.mc <- as.mcmc(jagsfit)
# plot(jagsfit.mc) # this plot give too many figures for the blog
fitsummary <- summary(jagsfit.mc)
# extract differences
Productdiff <- fitsummary$quantiles[ grep('Productdiff',rownames(fitsummary$quantiles)),]
# extract differences different from 0
data_list$Productcontr[Productdiff[,1]>0 | Productdiff[,5]<0,]
# get the product means
ProductMean <- fitsummary$statistics[ grep('meanProduct',rownames(fitsummary$quantiles)),]
rownames(ProductMean) <- levels(sensochoc$Product)
ProductMean
The result shows us a table of product pairs which are different; most of these are related to product 3, but also product 1 is different from 4 and 6.
> jagsfit.mc <- as.mcmc(jagsfit) > # plot(jagsfit.mc) > fitsummary <- summary(jagsfit.mc) > # extract differences > Productdiff <- fitsummary$quantiles[ grep('Productdiff',rownames(fitsummary$quantiles)),] > # extract differences different from 0 > data_list$Productcontr[Productdiff[,1]>0 | Productdiff[,5]<0,] v1 v2 2 1 3 3 2 3 4 1 4 6 3 4 9 3 5 11 1 6 13 3 6> # get the product means > ProductMean <- fitsummary$statistics[ grep('meanProduct',rownames(fitsummary$quantiles)),] > rownames(ProductMean) <- levels(sensochoc$Product) > ProductMean
Mean SD Naive SE Time-series SE choc1 7.083966 0.2225106 0.003518201 0.003233152 choc2 6.521746 0.2150476 0.003400201 0.003988800 choc3 4.780643 0.2272578 0.003593261 0.003553964 choc4 6.247723 0.2128971 0.003366198 0.003382249 choc5 6.745146 0.2165525 0.003423996 0.003230432 choc6 6.344260 0.2206372 0.003488580 0.003662731
Comparison with ANOVA
A few lines in R will give the standard analysis. The product means are very close. This ANOVA shows only differences involving product 3. This is probably due to usage of TukeyHSD, which can be a bit conservative in the ANOVA while the comparison in the Bayesian model is unprotected.
> library(car) > aa <- aov(CocoaA ~ Product * Panelist,data=sensochoc) > Anova(aa) Anova Table (Type II tests) Response: CocoaA Sum Sq Df F value Pr(>F) Product 207.54 5 12.6045 1.876e-10 *** Panelist 390.43 28 4.2343 1.670e-09 *** Product:Panelist 322.29 140 0.6991 0.9861 Residuals 573.00 174 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > TukeyHSD(aa,'Product') Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = CocoaA ~ Product * Panelist, data = sensochoc) $Product diff lwr upr p adj choc2-choc1 -0.5344828 -1.5055716 0.4366061 0.6088175 choc3-choc1 -2.4137931 -3.3848819 -1.4427043 0.0000000 choc4-choc1 -0.8275862 -1.7986750 0.1435026 0.1433035 choc5-choc1 -0.2931034 -1.2641923 0.6779854 0.9531915 choc6-choc1 -0.7241379 -1.6952268 0.2469509 0.2674547 choc3-choc2 -1.8793103 -2.8503992 -0.9082215 0.0000014 choc4-choc2 -0.2931034 -1.2641923 0.6779854 0.9531915 choc5-choc2 0.2413793 -0.7297095 1.2124682 0.9797619 choc6-choc2 -0.1896552 -1.1607440 0.7814337 0.9932446 choc4-choc3 1.5862069 0.6151181 2.5572957 0.0000742 choc5-choc3 2.1206897 1.1496008 3.0917785 0.0000000 choc6-choc3 1.6896552 0.7185663 2.6607440 0.0000192 choc5-choc4 0.5344828 -0.4366061 1.5055716 0.6088175 choc6-choc4 0.1034483 -0.8676406 1.0745371 0.9996304 choc6-choc5 -0.4310345 -1.4021233 0.5400544 0.7960685 > model.tables(aa,type='mean',cterms='Product') Tables of means Grand mean 6.287356 Product Product choc1 choc2 choc3 choc4 choc5 choc6 7.086 6.552 4.672 6.259 6.793 6.362
Conclusion
JAGS can be used to analyzed sensory profiling data. However, if a simple model such as two way ANOVA is used, it does not seem to be worth the trouble.
To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.