[This article was first published on distributed ecology, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Le Grand Casino of Monte Carlo |
A simple integration example
Let’s start with a trivial example, integrating a function we use all the time as ecologists, the normal distribution. Maybe you want to integrate the normal probability density function (PDF) from -1 to 1, because you’re curious about how likely an event within 1 standard deviation is. To get the area under the curve we simply integrate the PDF from -1 to 1.
MC integration of the normal PDF between -1 and 1 |
A simple statistical example
Another place we can use these methods in statistical hypothesis testing. The simplest case is as an alternative to a t-test. Imagine you have a data set with measurements of plant height for the same species in shaded and unshaded conditions. Your data might look like this:
Shaded | 13.3 |
Shaded | 12.1 |
Shaded | 14.7 |
Shaded | 12.8 |
Unshaded | 17.8 |
Unshaded | 19.4 |
Unshaded | 18.5 |
Unshaded | 18.5 |
“All that code for a t-test?”
That must be what you’re thinking, and you’re right, its certainly unwieldy to write all that code for something so simple. But its a good starting point for when we begin talking about null models. You may not have realized it but in the previous example there’s two assumptions behind our inference. The first is that some process or mechanism has caused a difference between our groups. The fact that plants grow to different heights in shaded and unshaded conditions says something about the way plants use light, or the way they compete for light, or maybe some other mechanism I haven’t thought about. The second is that by randomizing our existing data, we can simulate a situation where we have collected the data under completely stochastic conditions, e.g. the process causing plant height is random. So there are our two assumptions: A.) Our data set represents the outcome of some process and B). by randomizing we can create a null model without process to test our own data against. Here’s where things become a bit more gray. In our example above the the null hypothesis is pretty clear, and we can all agree on it, but problems arise with more complicated questions. Traditionally null models have been used to make inferences about community assembly rules. The use of these models was prevalent during the “battles” constituting what is tongue-in-cheek called the “null model wars”. I won’t take up any space rehashing the null model wars of the 70’s and 80’s but links to good resources can be found here on Jeremy Fox’s Oikos blog and his post about refighting the null model wars. Suffice to say careful attention needs to be paid to the selection of the proper null model. Nick Gotelli has lots of good papers about null models if you peruse his work . I’ve worked up several examples from his 2000 and 2010 papers, and sometimes the algorithms can be challenging. I’ll cover more advanced methods in a future post going over some methods from Nick’s papers.
To leave a comment for the author, please follow the link and comment on their blog: distributed ecology.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.