[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I have read about people doing a Bayesian PCA at some points and always wondered how that would work. Then, at some point I thought of a way to do so. As ideas evolved my interest became not PCA as such, but rather in a prefmap. As a first step in that this post contains the mapping from a sensory space to a two dimensional space. For prefmap this step is commonly done via a PCA.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Data
Data are the coctail data from sensominer package.Algorithm
The calculation is mostly inspired by the many PLS algorithms to which I was exposed when I was doing chemometrics. Scores and loadings may be obtained from each other by multiplying with the data matrix. In this case it means I just take a set of product scores and obtain the associated descriptors via a simple matrix multiplication. The resulting product and descriptor vectors can be used to reconstruct the original matrix; the best solution minimizes difference between the constructed and original data. For dimension two subtract reconstructed data from original data and repeat on residuals.Scaling
PCA has the possibility to have unit length scores or loadings, or R and Q mode if that is your favorite jargon. If one has a more singular value decomposition look, it is just where the eigenvalues go. At this point I made the choice to do that in the variable space.
Unique solution
PCA is known not to have one unique solution; each solution is equivalent to its mirror image. It seemed most elegant to do this completely at the end, after inspection of the data it seemed the location of product 12 was suitable for making the solution unique, since it was extreme on both dimensions. The final step (generated quantities) forces the location to be top right quadrant for data reported.Code
library(rstan)nprod <- 16
ndescr <- 13
sprofile <- as.matrix(scale(senso.cocktail,scale=FALSE))
datain <- list(
nproduct=nprod,
ndescriptor=ndescr,
profile=sprofile
)
model1 <- “
data {
int<lower=0> ndescriptor;
int<lower=0> nproduct;
matrix[nproduct,ndescriptor] profile;
}
parameters {
row_vector[nproduct] prodsp1;
row_vector[nproduct] prodsp2;
real<lower=0,upper=1> sigma1;
real<lower=0,upper=1> sigma2;
}
transformed parameters {
vector [ndescriptor] descrsp1;
vector [ndescriptor] descrsp2;
matrix[nproduct,ndescriptor] expected1;
matrix[nproduct,ndescriptor] expected2;
matrix[nproduct,ndescriptor] residual1;
descrsp1 <- profile’*prodsp1′;
expected1 <- (descrsp1*prodsp1)’;
residual1 <- profile-expected1;
descrsp2 <- profile’*prodsp2′;
expected2 <- (descrsp2*prodsp2)’;
}
model {
for (r in 1:nproduct) {
prodsp1[r] ~ normal(0,1);
prodsp2[r] ~ normal(0,1);
for (c in 1:ndescriptor) {
profile[r,c] ~ normal(expected1[r,c],sigma1);
residual1[r,c] ~ normal(expected2[r,c],sigma2);
}
}
}
generated quantities {
vector [ndescriptor] descrspace1;
vector [ndescriptor] descrspace2;
row_vector [nproduct] prodspace1;
row_vector [nproduct] prodspace2;
prodspace1 <-(
((prodsp1[12]>0)*prodsp1)-
((prodsp1[12]<0)*prodsp1)
);
prodspace2 <-(
((prodsp2[12]>0)*prodsp2)-
((prodsp2[12]<0)*prodsp2)
);
descrspace1 <-(
((prodsp1[12]>0)*descrsp1)-
((prodsp1[12]<0)*descrsp1)
);
descrspace2 <-(
((prodsp2[12]>0)*descrsp2)-
((prodsp2[12]<0)*descrsp2)
);
}
“
pars <- c(‘prodspace1′,’prodspace2′,’descrspace1′,’descrspace2’)
fit1 <- stan(model_code = model1,
data = datain,
pars=pars)
Results
For comparison, first a standard biplot.Product space
It is not difficult to extract the samples and plot them. See end of post. One notable property of the plot is that the products are in ellipses with the minor axis towards the center. Apparently part of variation between MCMC samples is rotational freedom between dimensions. Other than that the solution is actually pretty close to the PCADescriptor space
The rotational freedom is even more clear here.Additional code
data
library(SensoMineR)data(cocktail)
biplot
pr <- prcomp(senso.cocktail)plot(pr)
biplot(pr)
product plot
fit1samps <- as.data.frame(fit1)prod <- reshape(fit1samps,
drop=names(fit1samps)[33:59],
direction=’long’,
varying=list(names(fit1samps)[1:16],
names(fit1samps)[17:32]),
timevar=’sample’,
times=1:16,
v.names=c(‘PDim1′,’PDim2’)
)
prod <- prod[order(prod$PDim1),]
plot(prod$PDim1,prod$PDim2,
col=c(2,17,3,4,6,5,7:10,13,12,11,14:16)[prod$sample],
pch=46,
cex=2,
xlim=c(-1,1)*.75,
ylim=c(-1,1)*.75)
sa <- sapply(1:16,function(x)
c(sample=x,
Dim1=mean(prod$PDim1[prod$sample==x]),
Dim2=mean(prod$PDim2[prod$sample==x])))
sa <- as.data.frame(t(sa))
text(x=sa$Dim1,y=sa$Dim2,labels=sa$sample,cex=1.5)
descriptor plot
descr <- reshape(fit1samps,drop=names(fit1samps)[c language=”(1:32,59)”][/c],
direction=’long’,
varying=list(names(fit1samps)[33:45],
names(fit1samps)[46:58]),
timevar=’sample’,
times=1:13,
v.names=c(‘DDim1′,’DDim2’)
)
descr <- descr[order(descr$DDim1),]
plot(descr$DDim1,descr$DDim2,
col=c(2,1,3:13)[descr$sample],
pch=46,
cex=2,
xlim=c(-1,1)*9,
ylim=c(-1,1)*9)
sa <- sapply(1:13,function(x)
c(sample=x,
Dim1=mean(descr$DDim1[descr$sample==x]),
Dim2=mean(descr$DDim2[descr$sample==x])))
sa <- as.data.frame(t(sa))
text(x=sa$Dim1,y=sa$Dim2,labels=names(senso.cocktail))
To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.