Site icon R-bloggers

Showcasing the latest phylogenetic methods: AUTEUR

[This article was first published on Carl Boettiger » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

While high-speed fish feeding videos may be the signature of the lab, dig a bit deeper and you’ll find a wealth of comparative phylogenetic methods sneaking in.  It’s a natural union — expert functional morphology is the key to good comparative methods, just as phylogenies hold the key to untangling the evolutionary origins of that morphology.  The lab’s own former graduate, Brian O’Meara, made a revolutionary step forward in the land of phylogenetic methods when he unveiled Brownie in 2006, allowing researchers to identify major shifts in trait diversification rates across the tree.  This work spurred not only a flood of empirical applications but also methodological innovations, such as Liam’s brownie-lite, and today’s focus: Jon Eastman et al.‘s auteur package.

Auteur, short for “Accommodating uncertainty in trait evolution using R,” is the grown-up Bayesian RJMCMC version of that original idea in Brownie.  Diversification rates can change along the phylogenetic tree — only this time, you don’t have to specify where those changes could have occurred, or how many there may have been — auteur simply tries them all.

If you want the details, definitely go read the paper — it’s all there, clear and thorough.  Meanwhile, what we really want to do, is take it out for a test drive.

The package isn’t up on CRAN yet, so you can grab the development version from Jon’s github page, or click here.  Put that package in a working directory and fire up R in that directory.  Let’s go for a spin.

install.packages("auteur_0.11.0612.tar.gz", repos=NULL)
library(auteur)

Great, the package installed and loaded successfully. Looks like Jon’s put all 73 functions into the NAMESPACE, but it’s not hard to guess which one looks like the right one to start with.  rjmcmc.bm.  Yeah, that looks good.  It has a nice help file, with — praise the fish — example code.  Looks like we’re gonna run a simulation, where we know the answer, and see how it does:


     #############
     ## generate tree
     n=24
     while(1) {
             phy=prunelastsplit(birthdeath.tree(b=1,d=0,taxa.stop=n+1))
             phy$tip.label=paste("sp",1:n,sep="")
             rphy=reorder(phy,"pruningwise")

             # find an internal edge
             anc=get.desc.of.node(Ntip(phy)+1,phy)
             branches=phy$edge[,2]
             branches=branches[branches>Ntip(phy) & branches!=anc]
             branch=branches[sample(1:length(branches),1)]
             desc=get.descendants.of.node(branch,phy)
             if(length(desc)>=4) break()
     }
     rphy=phy
     rphy$edge.length[match(desc,phy$edge[,2])]=phy$edge.length[match(desc,phy$edge[,2])]*64

     e=numeric(nrow(phy$edge))
     e[match(c(branch,desc),phy$edge[,2])]=1
     cols=c("red","gray")
     dev.new()
     plot(phy,edge.col=ifelse(e==1,cols[1],cols[2]), edge.width=2)
     mtext("expected pattern of rates")

     #############
     ## simulate data on the 'rate-shifted' tree
     dat=rTraitCont(phy=rphy, model="BM", sigma=sqrt(0.1))

That creates this beautiful example (sorry, no random generator seed, you’re results may vary but that’s ok) tree:


Okay, so that’s the target, showing where the shift occurred.  Note the last line got us some data on this tree.  We’re ready to run the software.  It looks super easy:

## run two short reversible-jump Markov chains
 r=paste(sample(letters,9,replace=TRUE),collapse="")
 lapply(1:2, function(x) rjmcmc.bm(phy=phy, dat=dat, ngen=10000, sample.freq=10, prob.mergesplit=0.1, simplestart=TRUE, prop.width=1, fileBase=paste(r,x,sep=".")))

The data is going in as “phy” and “dat”, just as expected.  We won’t worry about the optional parameters that follow for the moment.  Note that because we use lapply to run multiple chains, it would be super easy to run this on multiple processors.

Note that Jon’s creating a bunch of directories to store parameters, etc.  This can be important for MCMC methods where chains get too cumbersome to handle in memory.  Enough technical rambling, let’s merge and load those files in now, and plot what we got:

# collect directories
dirs=dir("./",pattern=paste("BM",r,sep="."))
pool.rjmcmcsamples(base.dirs=dirs, lab=r)

## view contents of .rda
load(paste(paste(r,"combined.rjmcmc",sep="."),paste(r,"posteriorsamples.rda",sep="."),sep="/"))
print(head(posteriorsamples$rates))
print(head(posteriorsamples$rate.shifts))

## plot Markov sampled rates
dev.new()
shifts.plot(phy=phy, base.dir=paste(r,"combined.rjmcmc",sep="."), burnin=0.5, legend=TRUE, edge.width=2)

# clean-up: unlink those directories
 unlink(dir(pattern=paste(r)),recursive=TRUE)

Not only is that a beautiful plot, but it’s nailed the shift in species 12–16.  How’d your example do?

Auteur comes with three beautiful large data sets described in the paper.  Check them out, but expect longer run times than our simple example!


data(chelonia)
# take a look at this data
> chelonia
$phy
Phylogenetic tree with 226 tips and 225 internal nodes.

Tip labels:
Elseya_latisternum, Chelodina_longicollis, Phrynops_gibbus, Acanthochelys_radiolata, Acanthochelys_macrocephala, Acanthochelys_pallidipectoris, ...

Rooted; includes branch lengths.

$dat
Pelomedusa_subrufa                   Pelusios_williamsi
2.995732                             3.218876
...
dat <- chelonia$dat
phy <- chelonia$phy
## ready to run as above

Thanks Jon and the rest of the Harmon Lab for a fantastic package. This is really just a tip of the iceberg, but should help get you started. See the paper for a good example of posterior analyses requisite after running any kind of MCMC, or stay tuned for a later post.

(cross-posted from my entry on the Wainwright lab blog.)


Eastman, J., Alfaro, M., Joyce, P., Hipp, A., & Harmon, L. (2011). A NOVEL COMPARATIVE METHOD FOR IDENTIFYING SHIFTS IN THE RATE OF CHARACTER EVOLUTION ON TREES Evolution DOI: 10.1111/j.1558–5646.2011.01401.x

< !-- kcite active, but no citations found -->
< !-- kcite-section 2337 -->

To leave a comment for the author, please follow the link and comment on their blog: Carl Boettiger » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.