Site icon R-bloggers

Tips & Tricks 8: Examining Replicate Error

[This article was first published on geomorph, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Geomorph users,

When starting out in a geometric morphometrics study, the common questions are ones of repeatability and measurement error.

How much of the variation in the Procrustes residuals is due to human (digitizing) error? How much is due to paralax (2D photographs)? How much is due to the threshold choice (3D surface meshes)?

Today we use the Procrustes ANOVA function to learn about how to check for repeatability and in doing so learn also about nested ANOVAs.

Exercise 8 – Examining Replicate Error with procD.lm().


No one is perfect. And neither will our measurements be. But we can take a few precautions to minimise error.

In geometric morphometrics, error can come from many different stages in the data collection process. It is important to assess where in your study error could occur and how to minimise the propagation of error across different stages (e.g. photographing, digitizing, translating data to analysis software). And then to work this into your pilot study.

Here we will look at measurement error and repeatability.

For example: Let’s say you are taking photographs of your specimens. They are rather rounded and so it is hard to place them flat on the table to photograph from above. Issue 1 here is whether the shape variation we observe in the photo is real or due to placing the specimen at slightly different angles. Then, when once you have the photograph you need to digitize the landmarks. Issue 2 is whether you put the landmarks in the same place every time (i.e. is your criteria for the landmark robust enough that its obvious where it should be placed on each specimen, and if you came back to the data a month or year later?)

In this instance we could take two sets of pictures, each time removing and positioning the specimen. And we could digitize each image twice, preferably in different sessions (another day or week). This would give us 4 sets of landmark data for each specimen.

If it were me, I would label the files:
              Individual_photo1_rep1.jpg 
Where I have the ID of the individual, followed by which picture (photo1 or photo2) and then the digitizing replicate (rep1 or rep2).

To test for differences between landmark sets:

1) Read the coordinate data into R (using geomorph’s functions readland.tps() or readland.nts() for example). 
2) Use gpagen() to perform a Procrustes Superimposition
3) Perform a Procrustes ANOVA in the style:

procD.lm(Y.gpa$coords ~ ind:photo:rep)
Y.gpa$coords is the 3D array of Procrustes residuals (shape data)
# ind is a vector containing labels for each individual
# photo is a vector designating whether the photo is 1 or 2
# rep is a vector designating whether the replicate is 1 or 2

(Tip! Use strsplit() to make these classifier vectors from the photo names, as we did in Tips & Tricks 5)

See here we use : in the model term – this means we are performing a nested ANOVA

What we are looking for in the resulting ANOVA table is the values of the Mean Squares (MS) column. Compare the value for ind:photo and ind:photo:rep with ind. 

To calculate the repeatability of our digitizing ability, we subtract the MS of the photo term from the individual term and divide by two (because we have two replicates):
(MS(ind) – MS(ind:photo:rep))/2 
Then we calculate the ratio of this value to the total MS:
((MS(ind) – MS(ind:photo:rep))/2 ) / (MS(ind)+MS(ind:photo)+MS(ind:photo:rep)

The result is a value, which in good circumstances is somewhere above 0.95; a repeatability of 0.95 and thus 5% error.

The same can be done for the photos (ind:photo) but of course remember digitizing error is also in this term. This post is inspired by Chapter 9 of the Green Book, which I strongly recommend reading.

Remember, all this can be done by accessing the parts of the ANOVA table using regular R indexing. Dump the output of procD.lm into an object e.g. called res then res[,3] will be the MS values.

Enjoy!

Emma

To leave a comment for the author, please follow the link and comment on their blog: geomorph.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.