Bayesian analysis of sensory profiling data
[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I looked at Bayesian analysis of sensory profiling data in May and June 2012. I do remember not being totally happy with the result and computations taking a bit more time than I wanted. But now it is 2014, I can use STAN and I have been thinking about the model I want a bit more. Hence a fresh start.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Data
Data is the chocolate data from SensoMineR. I find it more convinient to have the data in long format, so the first action is a reshape. Score gets an as.numerical, since it was integer and I do not know how that effects STAN calculations.data(chocolates,package=’SensoMineR’)
vars <- names(sensochoc)
choc <- reshape(sensochoc,
idvar=vars[1:4],
varying=list(vars[5:18]),
v.names=’Score’,
direction=’long’,
timevar=’Descriptor’,
times=vars[5:18])
choc$Descriptor <- as.factor(choc$Descriptor)
choc$Score <- as.numeric(choc$Score)
Model
sensory panel data
To explain the model, I first need to explain about sensory panel data. The objective of the sensory experiment is to obtain a profile, a table which shows how strong food tastes on selected descriptors (pre-defined properties). To obtain this, panelists taste food stuffs and score the strength of the food stuff on the descriptors. The order of tasting the food stuffs is registered as rounds. Unfortunately panelists have ‘errors’ in their scores. To minimize the effect of the errors, 10 to 20 panelists are used, possibly with repetitions.standard analysis
In general these data are analyzed by descriptor with ANOVA models such as:score ~ product + panelist + round + session + error,
possibly with second order interaction terms, of which panelist*product would be the largest.
Alternatively, methods such as Procrustes or STATIS are used. These methods try to find communality in the n dimensional configuration which products occupy in the the descriptor space. In particular, Procrustes tries to find a common configuration by shifting, scale (size) change and rotating individual configurations such that they all align. Note that this process destroys the link between products and descriptors.
building a model
The aim of this post is to build the bare bones of a model which combines the good parts of Procrustes with the features of a linear model. The model must analyze all descriptors in one go. Hence one part of the model is the profile. Panelist effects are covered with a shift and shrinking of the profile. At this point the shift and scale is covered by one parameter per panelist, this will probably be changed in a second version of the model. A rounds effect has been added, common for all descriptors and all panelists, another spot for model extensions.A few priors have been added on scale usage, the values seemed reasonable for a trained panel.
model1 <-
‘
data {
int
int
int
int
int
int
vector[nobs] y;
int
int
int
int
real maxy;
}
parameters {
matrix
vector
vector
vector
real
}
transformed parameters {
vector [nobs] expect;
vector[npanelist] sensitivity;
// the following variables are not strictly needed but
// greatly increase speed and # effective samples
real meanshift;
real mlogsens;
real mroundeff;
meanshift <- mean(shift);
mlogsens <- mean(logsensitivity);
mroundeff <- mean(roundeffect);
// end of definitions of variables for speed
for (i in 1:npanelist) {
sensitivity[i] <- pow(10,logsensitivity[i]-mlogsens);
}
for (i in 1:nobs) {
expect[i] <- profile[product[i],descriptor[i]]
*sensitivity[panelist[i]]
+ shift[panelist[i]]-meanshift
+ roundeffect[rounds[i]]-mroundeff;
}
}
model {
logsensitivity ~ normal(0,0.1);
shift ~ normal(0,maxy/10);
roundeffect ~ normal(0,maxy/10);
y ~ normal(expect,sigma);
}
‘
running the model
These are just the parts needed to get rstan running.library(rstan)
datainchoc <- with(choc,list(
panelist=(1:nlevels(Panelist))[Panelist],
npanelist=nlevels(Panelist),
session=(1:nlevels(Session))[Session],
nsession=nlevels(Session),
rounds=(1:nlevels(Rank))[Rank],
nround=nlevels(Rank),
product=(1:nlevels(Product))[Product],
nproduct=nlevels(Product),
descriptor=(1:nlevels(Descriptor))[Descriptor],
ndescriptor=nlevels(Descriptor),
y=Score,
nobs=length(Score),
maxy=10))
pars <- c('profile','shift','roundeffect','sensitivity','sigma')
fitchoc <- stan(model_code = model1,
data = datainchoc,
pars=pars,
iter = 1100,
warmup=100,
chains = 4)
results
Plotting of chains revealed that 100 warmup samples was sufficient. Quick plot of the output:plot(fitchoc)
Obviously as for a sensory analyst it is all about the profile. What is more nice than to use ggplot2 for that and include intervals:
samples <- extract(fit1,'profile')$profile
profile <- expand.grid(Product=levels(choc$Product),
Descriptor=levels(choc$Descriptor))
profile$des <- as.numeric(profile$Descriptor)
profile$prod <- as.numeric(profile$Product)
profs <- as.data.frame(t(sapply(1:nrow(profile),function(i){
subsamples <- c(samples[1:5,profile$prod[i],profile$des[i]])
c(mean=mean(subsamples),quantile(subsamples,c(.1,.9)))
})))
names(profs) <- c('score','lmin','lmax')
profile <- cbind(profile,profs)
library(ggplot2)
p1 <- ggplot(profile, aes(y=score, x=des,color=Product))
p1 + geom_path() +
scale_x_continuous(
breaks=1:nlevels(profile$Descriptor),
labels=levels(profile$Descriptor)) +
xlab(”) +
geom_ribbon(aes(ymin=lmin, ymax=lmax,fill=Product),
alpha=.15,linetype=’blank’) +
coord_flip()
To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.