Site icon R-bloggers

Mapping products in a space

[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I have read about people doing a Bayesian PCA at some points and always wondered how that would work. Then, at some point I thought of a way to do so. As ideas evolved my interest became not PCA as such, but rather in a prefmap. As a first step in that this post contains the mapping from a sensory space to a two dimensional space. For prefmap this step is commonly done via a PCA.

Data

Data are the coctail data from sensominer package.

Algorithm

The calculation is mostly inspired by the many PLS algorithms to which I was exposed when I was doing chemometrics. Scores and loadings may be obtained from each other by multiplying with the data matrix. In this case it means I just take a set of product scores and obtain the associated descriptors via a simple matrix multiplication. The resulting product and descriptor vectors can be used to reconstruct the original matrix; the best solution minimizes difference between the constructed and original data. For dimension two subtract reconstructed data from original data and repeat on residuals.

Scaling

PCA has the possibility to have unit length scores or loadings, or R and Q mode if that is your favorite jargon. If one has a more singular value decomposition look, it is just where the eigenvalues go. At this point I made the choice to do that in the variable space. 

Unique solution

PCA is known not to have one unique solution; each solution is equivalent to its mirror image. It seemed most elegant to do this completely at the end, after inspection of the data it seemed the location of product 12 was suitable for making the solution unique, since it was extreme on both dimensions. The final step (generated quantities) forces the location to be top right quadrant for data reported.

Code

library(rstan)
nprod <- 16
ndescr <- 13
sprofile <- as.matrix(scale(senso.cocktail,scale=FALSE))
datain <- list(
    nproduct=nprod,
    ndescriptor=ndescr,
    profile=sprofile
    )
   
model1 <- “
data {
        int<lower=0> ndescriptor;
        int<lower=0> nproduct;
        matrix[nproduct,ndescriptor] profile;
}
parameters {
    row_vector[nproduct] prodsp1;
    row_vector[nproduct] prodsp2;
    real<lower=0,upper=1> sigma1;
    real<lower=0,upper=1> sigma2;
}
transformed parameters {
   vector [ndescriptor] descrsp1;
   vector [ndescriptor] descrsp2;
   matrix[nproduct,ndescriptor] expected1;  
   matrix[nproduct,ndescriptor] expected2;  
   matrix[nproduct,ndescriptor] residual1;  

   descrsp1 <- profile’*prodsp1′;
   expected1 <- (descrsp1*prodsp1)’;
   residual1 <- profile-expected1;
   descrsp2 <- profile’*prodsp2′;
   expected2 <- (descrsp2*prodsp2)’;
}
model {  
     for (r in 1:nproduct) {
        prodsp1[r] ~ normal(0,1);
        prodsp2[r] ~ normal(0,1);
        for (c in 1:ndescriptor) {
           profile[r,c] ~ normal(expected1[r,c],sigma1);
           residual1[r,c] ~ normal(expected2[r,c],sigma2);
        }
     }
}
generated quantities {
   vector [ndescriptor] descrspace1;
   vector [ndescriptor] descrspace2;
   row_vector [nproduct] prodspace1;
   row_vector [nproduct] prodspace2;
   prodspace1 <-(
                     ((prodsp1[12]>0)*prodsp1)-
                     ((prodsp1[12]<0)*prodsp1)
                  );
   prodspace2 <-(
                     ((prodsp2[12]>0)*prodsp2)-
                     ((prodsp2[12]<0)*prodsp2)
                  ); 
   descrspace1 <-(
                     ((prodsp1[12]>0)*descrsp1)-
                     ((prodsp1[12]<0)*descrsp1)
                  );
   descrspace2 <-(
                     ((prodsp2[12]>0)*descrsp2)-
                     ((prodsp2[12]<0)*descrsp2)
                  ); 
}

pars <- c(‘prodspace1′,’prodspace2′,’descrspace1′,’descrspace2’)
fit1 <- stan(model_code = model1,
    data = datain,
    pars=pars)

Results

For comparison, first a standard biplot.

Product space

It is not difficult to extract the samples and plot them. See end of post. One notable property of the plot is that the products are in ellipses with the minor axis towards the center. Apparently part of variation between MCMC samples is rotational freedom between dimensions. Other than that the solution is actually pretty close to the PCA

Descriptor space

The rotational freedom is even more clear here.

Additional code

data

library(SensoMineR)
data(cocktail)

biplot

pr <- prcomp(senso.cocktail) 
plot(pr)
biplot(pr)

product plot

fit1samps <- as.data.frame(fit1)

prod <- reshape(fit1samps,
    drop=names(fit1samps)[33:59],
    direction=’long’,
    varying=list(names(fit1samps)[1:16],
        names(fit1samps)[17:32]),
    timevar=’sample’,
    times=1:16,
    v.names=c(‘PDim1′,’PDim2’)
)
   
prod <- prod[order(prod$PDim1),]
plot(prod$PDim1,prod$PDim2,
    col=c(2,17,3,4,6,5,7:10,13,12,11,14:16)[prod$sample],
    pch=46,
    cex=2,
    xlim=c(-1,1)*.75,
    ylim=c(-1,1)*.75)
sa <- sapply(1:16,function(x)
        c(sample=x,
            Dim1=mean(prod$PDim1[prod$sample==x]),
            Dim2=mean(prod$PDim2[prod$sample==x])))
sa <- as.data.frame(t(sa))
text(x=sa$Dim1,y=sa$Dim2,labels=sa$sample,cex=1.5)

descriptor plot

descr <- reshape(fit1samps,
    drop=names(fit1samps)[c language=”(1:32,59)”][/c],
    direction=’long’,
    varying=list(names(fit1samps)[33:45],
        names(fit1samps)[46:58]),
    timevar=’sample’,
    times=1:13,
    v.names=c(‘DDim1′,’DDim2’)
)

descr <- descr[order(descr$DDim1),]
plot(descr$DDim1,descr$DDim2,
    col=c(2,1,3:13)[descr$sample],
    pch=46,
    cex=2,
    xlim=c(-1,1)*9,
    ylim=c(-1,1)*9)
sa <- sapply(1:13,function(x)
        c(sample=x,
            Dim1=mean(descr$DDim1[descr$sample==x]),
            Dim2=mean(descr$DDim2[descr$sample==x])))
sa <- as.data.frame(t(sa))
text(x=sa$Dim1,y=sa$Dim2,labels=names(senso.cocktail))

To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.