"R": Predicting a Test Set (Gasoline)
[This article was first published on NIR-Quimiometría, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
> data(gasoline)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
> #60 spectra of gasoline (octane is the constituent)
> #We divide the whole Set into a Train Set and a Test Set.
> gasTrain<-gasoline[1:50,]
> gasTest<-gasoline[51:60,]
> #Let´s develop the PLSR with the Tain Set and LOO CV
> gas1<-plsr(octane~NIR,ncomp=10,data=gasTrain,validation="LOO")
> summary(gas1)
Data: X dimension: 50 401
Y dimension: 50 1
Fit method: kernelpls
Number of components considered: 10
VALIDATION: RMSEP
Cross-validated using 50 leave-one-out segments.
(Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
CV 1.545 1.357 0.2966 0.2524 0.2476 0.2398 0.2319
adjCV 1.545 1.356 0.2947 0.2521 0.2478 0.2388 0.2313
7 comps 8 comps 9 comps 10 comps
CV 0.2386 0.2316 0.2449 0.2673
adjCV 0.2377 0.2308 0.2438 0.2657
TRAINING: % variance explained
1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
X 78.17 85.58 93.41 96.06 96.94 97.89 98.38 98.85
octane 29.39 96.85 97.89 98.26 98.86 98.96 99.09 99.16
9 comps 10 comps
X 99.02 99.19
octane 99.28 99.39
> #For this exercice we decide 3 components
> #Let´s predict our Test Set with this 3 components Model.
> predict(gas1,ncomp=3,newdata=gasTest)
, , 3 comps octane
51 87.94907
52 87.30484
53 88.21420
54 84.86945
55 85.24244
56 84.57502
57 87.37650
58 86.78971
59 89.10282
60 86.97223
> #To Plot these data:
>predplot(gas1,ncomp=3,newdata=gasTest,asp=1,line=TRUE)
> #Let´s look to the RMSEP Statistic.This is very nice tool to decide if 3 components is fine or we can choose more or less components.
> RMSEP(gas1,newdata=gasTest)
(Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps
1.5369 1.1696 0.2445 0.2341 0.3287 0.2780
6 comps 7 comps 8 comps 9 comps 10 comps
0.2703 0.3301 0.3571 0.4090 0.6116
> #It´s fine, we can also consider to choose only two.The RMSEP is 0,234.
> #The CV for the Model with 3 components was: 0,252.
> #Really R is a wonderful tool to develop regressions, and to understand better all what is behind the algorithms.
> #We can get a lot of literature on internet to start working with R.
> #Thanks to Bjorn-Helge Mevik & Ron Wehres for their good tutorials about the PLS Package, they help me to understand better this program and to continue learning,(I have ordered some books).
To leave a comment for the author, please follow the link and comment on their blog: NIR-Quimiometría.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.