Site icon R-bloggers

Shootout 2012 : first PLS regressions

[This article was first published on NIR-Quimiometria, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
It´s time to start developing some regressions in order to find the best math treatment, the best number of terms, the best spectral regions, the best regression method,….

This time I´m working with the PLS  package in R, and just to make more familiarity with it, I us the pls regression, with the full range, and with two math treatments.: MSC and SG Filters (with first and second derivatives). I will try in other post to select spectral regions, or even other regression methods. 

Indeed to look to the Cross Validation statistics I will look to the prediction statistics for the test set. We have seen that the samples in this set are not fully represented by the training set, and if we predict them fine is a symptom that the equation is robust. Don´t forget that the idea is to predict as better as possible a validation set, which in theory we don´t know the values. (we already know them and I will compare my results in the future with the winner, and other participants).

I develop a regression (1) with MSC, and I look to the prediction statistics for the test set:
>Active_reg1<- pls(Active~NIT.msc,ncomp=5,data=shootcalmsc.2012 , validation = “LOO”)
>RMSEP(Active_reg1,newdata=shoottestmsc.2012)

(Intercept)      1 comps      2 comps      3 comps      4 comps      5 comps 
     1.1637       0.6944       0.5028       0.4586       0.4913       0.5355


Now the regression (2) with a SG filter (first derivative)
>Active_reg2<- plsr(Active~NITsg, ncomp =5,data=shootcalsg.2012 , validation = “LOO”)
>RMSEP(Active_reg2,newdata=shoottestmsc.2012)
(Intercept)      1 comps      2 comps      3 comps      4 comps      5 comps 
     1.1637       1.0414       0.4172       0.4313       0.4531       0.4556


In case that the SG filter has the second derivative, the RMSEP statistics are:
(Intercept)      1 comps      2 comps      3 comps      4 comps      5 comps 
     1.1637       0.5506       0.4269       0.4227       0.4134       0.4009


We can have a look to the Predicted vs. Lab plots:
>predplot(Active_reg1,ncomp=3,newdata=shoottestmsc.2012,asp=1,line=TRUE,main=”MSC math-treatment”)>predplot(Active_reg2,ncomp=2,newdata=shoottestsg.2012,asp=1,line=TRUE,main=”SG second der”)



Well, The plots are not really nice, It is clear that we can separate the two groups, but the results are not very accurate. I have to continue working on it in order to see if I improve this plot, looking to the RMSEP.
We can play with the parameters of the SG filter and try, but I think is better to select spectral regions. I will let you know in other post.

If you are interested in this post, there are some previous ones you can find also interesting:
“Sample Sets” plots (Shootout-2012)
Shootout 2012: Test & Val Sets proyections
Working with Shootout – 2012 in R (001)
Shootout 2012 files

To leave a comment for the author, please follow the link and comment on their blog: NIR-Quimiometria.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.