Bayesian analysis of sensory profiling data III

Posted on February 16, 2014 by Wingfeet in R bloggers | 0 Comments

[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Last week I extended my Bayesian model. So this week I wanted to test it with different data. There is one other data set with profiling data in R, french fries in the reshape package. ‘This data was collected from a sensory experiment conducted at Iowa State University in 2004. The investigators were interested in the effect of using three different fryer oils had on the taste of the fries’. In this post I try t analyze the data, however the documentation which I found is so limited, it is needed to second guess the objectives of the experiment.

Data

The data includes a treatment and a time variable. The time is in weeks and crossed with the other variables, suggesting that this is systematically varied and hence in itself a factor of interest. In the end I decided to cross treatments and time to generate a product factor with 30 levels. In addition I crossed time with a repetition factor so this effect could function as a random variable. As bonus, there are some missing data, which Stan does not like and are removed before analysis.
data(french_fries,package=’reshape’)
head(french_fries)
vars <- names(french_fries)
fries <- reshape(french_fries,
        idvar=vars[1:4],
        varying=list(vars[5:9]),
        v.names=’Score’,
        direction=’long’,
        timevar=’Descriptor’,
        times=vars[5:9])
head(fries)
fries$Product <- interaction(fries$treatment,fries$time)
fries$Session <- interaction(fries$time,fries$rep)
fries$Descriptor <- factor(fries$Descriptor)
fries <- fries[complete.cases(fries),]

Model

The model is a variation on last week, because I there is no rounds information but there are sessions. In addition, I made the range for the shift variable a bit larger, because it appeared some panelists have extreme behavior.
model1 <-
        ‘
    data {
        int npanelist;
        int nobs;
        int nsession;
        int ndescriptor;
        int nproduct;
        vector[nobs] y;
        int panelist[nobs];
        int product[nobs];
        int descriptor[nobs];
        int session[nobs];
        real maxy;
    }
    parameters {
        matrix [nproduct,ndescriptor] profile;
        vector[npanelist] shift[ndescriptor];
        vector[npanelist] sitting[nsession];
        vector [npanelist] logsensitivity;
        real varr;
        vector [npanelist] lpanelistvar;
        vector [ndescriptor] ldescriptorvar;
        real sigmashift;
        real sigmasitting;
    }
    transformed parameters {
        vector [nobs] expect;
        vector[npanelist] sensitivity;
        real mlogsens;
        real mlpanelistvar;
        real mldescriptorvar;
        real meanshift[ndescriptor];
        real meansit[nsession];
        vector [nobs] sigma;
        mlogsens <- mean(logsensitivity);
        for (i in 1:ndescriptor) {
            meanshift[i] <- mean(shift[i]);
        }
        mlpanelistvar <- mean(lpanelistvar);
        mldescriptorvar <- mean(ldescriptorvar);
        for (i in 1:nsession) {
            meansit[i] <- mean(sitting[i]);
        }
        for (i in 1:npanelist) {
            sensitivity[i] <- pow(10,logsensitivity[i]-mlogsens);
        }
        for (i in 1:nobs) {
            expect[i] <- profile[product[i],descriptor[i]]
                *sensitivity[panelist[i]]
                + shift[descriptor[i],panelist[i]]-meanshift[descriptor[i]]
                + sitting[session[i],panelist[i]]-meansit[session[i]];
            sigma[i] <- sqrt(varr
                *exp(lpanelistvar[panelist[i]]-mlpanelistvar)
                *exp(ldescriptorvar[descriptor[i]]-mldescriptorvar));
            }
    }
    model {
        logsensitivity ~ normal(0,0.1);
        for (i in 1: ndescriptor) {
            shift[i] ~ normal(0,sigmashift);
            ldescriptorvar[i] ~ normal(0,1);
        }
        for (i in 1:nsession)
            sitting[i] ~ normal(0,sigmasitting);
        for (i in 1:npanelist)
            lpanelistvar[i] ~ normal(0,1);
        y ~ normal(expect,sigma);
    }
    generated quantities    {
        vector [npanelist] panelistsd;
        vector [ndescriptor] descriptorsd;
        for (i in 1:npanelist) {
            panelistsd[i] <- sqrt(exp(lpanelistvar[i]));
        }
        for (i in 1:ndescriptor) {
            descriptorsd[i] <- sqrt(exp(ldescriptorvar[i]));
        }
    }
        ‘

Additional code

This is just the stuff needed to get Stan running
datainfries <- with(fries,list(
                panelist=(1:nlevels(subject))[subject],
                npanelist=nlevels(subject),
                session=(1:nlevels(Session))[Session],
                nsession=nlevels(Session),
                product=(1:nlevels(Product))[Product],
                nproduct=nlevels(Product),
                descriptor=(1:nlevels(Descriptor))[Descriptor],
                ndescriptor=nlevels(Descriptor),
                y=Score,
                nobs=length(Score),
                maxy=15))
pars <- c('profile','sensitivity','shift','sitting',
        ‘panelistsd’,’descriptorsd’)
library(rstan)
fitfries <- stan(model_code = model1,
        data = datainfries,
        pars=pars,
        iter = 1100,
        warmup=100,
        chains = 4)

Results

Traceplot is not shown, but the general plot provides some overview.

Plot of the profile plot has been adapted given the design. This plot should convey the message the data contains. As time progresses the flavor becomes more rancid and pointy, less potato. This process starts at circa 4 weeks.Treatment 3 seems least affected, but the difference is minute.

samples <- extract(fitfries,'profile')$profile
nsamp <- dim(samples)[1]
profile <- expand.grid(Product=levels(fries$Product),
        Descriptor=levels(fries$Descriptor))
profile$des <- as.numeric(profile$Descriptor)
profile$prod <- as.numeric(profile$Product)
profs <- as.data.frame(t(sapply(1:nrow(profile),function(i){
            subsamples <-c(samples[1:nsamp,
                             profile$prod[i],profile$des[i]])
            c(mean=mean(subsamples),quantile(subsamples,c(.1,.9)))
                        })))
names(profs) <- c('score','lmin','lmax')
profile <- cbind(profile,profs)
profile <- merge(profile,
           unique(fries[,c(‘Product’,’treatment’,’time’)])
           )

library(ggplot2)

profile$timen <- c(1:12)[profile$time]
profile <- profile[order(profile$timen),]

p1 <- ggplot(profile, aes(y=score, x=timen,color=treatment))

p1 + geom_path() +
        scale_x_continuous(
                breaks=1:nlevels(profile$time),
                labels=levels(profile$time)) +
        xlab(”) +
         geom_ribbon(aes(ymin=lmin, ymax=lmax,fill=treatment),
                 alpha=.15,linetype=’blank’) +
        facet_grid( Descriptor ~ . )

To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Bayesian analysis of sensory profiling data III