PLS2 with "R"

Posted on September 22, 2012 by jrcuesta in R bloggers | 0 Comments

[This article was first published on NIR-Quimiometría, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I´ve been working these days with PLS2 calibrations with a chemometric software called “Unscrambler” with a data set called “jam”. I said “can I develop PLS2 models with R?”.I look in the book “Introduction to Multivariate Statistical Analysis in Chemometrics”, and I got the response “Yes, we can”.

I have other posts for PLS regressions, but it is PLS1, where we have an X matrix (spectra) and we make a regression for one constituent of the Y matrix at a time. What about to make the regression for all the constituents at the same time using the whole Y matrix?. That is the purpose of PLS2.

PLS2 is recommended when there is a high correlation between the constituents.

library(chemometrics)

data(cereal)

This data is part of a set used by Varmuza et al. 2008, for other papers.

You can get a description for this data in the R help page:

Description

For 15 cereals an X and Y data set, measured on the same objects, is available. The X data are 145 infrared spectra, and the Y data are 6 chemical/technical properties (Heating value, C, H, N, Starch, Ash). Also the scaled Y data are included (mean 0, variance 1 for each column). The cereals come from 5 groups B=Barley, M=Maize, R=Rye, T=Triticale, W=Wheat.

Once loaded, take a look to the data

dim(cereal$X)

dim(cereal$Ysc)

We can have a look to the spectra, (it is already treated with SG and first derivative).

wavelengths<-seq(1126,2278,by=8)

matplot(wavelengths,t(cereal$X),lty=1,xlab=”wavelengths(nm)”,ylab=”log(1/R)”)

Now let´s run PLS2, using “mvsr”, with “LOO” (leave one out) cross validation.

cerpls2<-mvr(Ysc~X,data=cereal,method="simpls",validation="LOO")

We can see a summary of the results:

summary(cerpls2)

Now we have to take an important decision, “How many terms to choose?”.

Plots can help us with it:

plot(RMSEP(cerpls2), legendpos = “topright”)

We have to select an average, and looking to the plots we can say that 7 is fine, anyway for “starch” less terms would be fine, but for the rest 6 or 7 is the correct number.

To leave a comment for the author, please follow the link and comment on their blog: NIR-Quimiometría.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

"R" Chemometrics

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

PLS2 with "R"

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)