[This article was first published on NIR-Quimiometría, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
It is clear that MSC does not remove the entire scatter in the raw spectra, so some of the information is hidden by the scatter. Improvement of the sample presentation will help to remove the scatter.
We know that the first loading is much related to the main source of variance (in this case the scatter). In the next figure, I overplot the standard deviation spectrum (multiplied by 10, in order to compare them easily) with the first loading.
> l1sd10<-cbind(loading1,sdfattyac_msc_c)
> l1sd10<-cbind(loading1,sd10)
> plot(t(l1sd10))
> matplot(wavelengths,l1sd10,lty=1,pch=21)
The second loading will give us more details about the bands positions.
I´m going to use the function “Find Peaks”, from the package “quantmode”.
> findPeaks(loading2)
X878 X932 X972
15 42 62
The band at 932 nm (data point 42) is probably due to a C-H third overtone vibration of fat. The band at 972nm has some relation with the C-H2 vibration and water. The band at 878 seems to be also related with fat.
We can also interpret if possible the other loadings.
We saw how one of the samples (66) has a MD of 11.6. Let´s see the values for the six constituents for this sample:
> fattyac_msc[66,1:6]
C16_0 C16_1 C18_0 C18_1 C18_2 C18_3
66 15.8 2 6 62.3 10.2 0.6
Let´s compare with the summary
> summary(fattyac_msc)
C16_0 C16_1
Min. : 0.00 Min. :1.500
1st Qu.:20.10 1st Qu.:2.000
Median :21.00 Median :2.200
Mean :21.34 Mean :2.267
3rd Qu.:22.90 3rd Qu.:2.500
Max. :26.00 Max. :3.500
C18_0 C18_1
Min. : 5.800 Min. :43.80
1st Qu.: 8.600 1st Qu.:51.95
Median : 9.400 Median :54.50
Mean : 9.711 Mean :53.93
3rd Qu.:10.500 3rd Qu.:56.15
Max. :14.000 Max. :62.30
C18_2 C18_3
Min. : 5.500 Min. :0.3000
1st Qu.: 7.600 1st Qu.:0.5000
Median : 8.500 Median :0.6000
Mean : 8.503 Mean :0.6032
3rd Qu.: 9.100 3rd Qu.:0.7000
Max. :14.700 Max. :1.3000
Sample 66 has the higher value for C18:1 (oleic acid), but it is not isolated in the histogram. For some reasons this sample differs from the others especially from 100 to 1050 nm. We will wait forward to take a decision about this sample.
Until now we have been managing with the X matrix.
Now we start to study the Y matrix. First thing to do is to have a look to the summary, and of course to the histograms.
If you want to follow this tutorial, please send me an e_mail. I´ll send you the “txt” file attached.
> hist(C16_0,col=”red”)
> hist(C16_1,col=”blue”)
> hist(C18_0,col=”green”)
> hist(C18_1,col=”brown”)
> hist(C18_2,col=”violet”)
> hist(C18_3,col=”orange”)
To leave a comment for the author, please follow the link and comment on their blog: NIR-Quimiometría.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.