Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Among-species comparisons can include phylogenetic information to account for non-independence arising from shared evolutionary history. Often, phylogenetic topologies and branch lengths are not known exactly, but are estimated with uncertainty. This uncertainty can be accounted for using methods recently described in a neat paper called Bayesian models for comparative analysis integrating phylogenetic uncertainty by Villemereuil et al. Here, I’ll demonstrate the method by estimating the body mass of the Siberut macaque (Macaca siberu).
(I don’t study macaques, but they’re cute enough to warrant a toy example)
Building the phylogeny
First, I downloaded a nexus file with DNA sequences from A molecular phylogeny of living primates by Polina Perelman et al., available here on TreeBASE. I culled the nexus file to include only the 14 species in the genus Macaca, and saved it as macaques.nex.
I used MrBayes to estimate the macaque phylogeny with the following file (analysis.nex):
1 2 3 4 5 6 7 8 | #nexus begin mrbayes; set autoclose=yes nowarn=yes; execute macaques.nex; lset nst=6 rates=invgamma; mcmc nruns=1 ngen=120000 samplefreq=120 file=macaques.nex1; mcmc file=macaques.nex2; end; |
Calling MrBayes from within R:
1 | system("mb 'analysis.nex' > log.txt &") |
Here is the unrooted consensus tree:
Body mass data
Macaque weight data are available as part of a (much) larger dataset on the body mass of late quaternary mammals, by Smith and colleagues. I extracted the log body mass data for the 13 available macaque species. No body mass data were available for the Siberut macaque (M. siberu).
Body mass model
We will account for phylogenetic non-independence by considering average species weights to be multivariate normally distributed around a within-genus mean, with a covariance matrix $\Sigma$ that reflects phylogenetic distance (see Villemereuil et al.). The off-diagonal elements of this matrix are scaled by Pagel’s $\lambda$, which reflects the degree of phylogenetic signal in the data.
With the help of the ape and MASS packages, covariance and precision matrices can be constructed for the trees comprising the posterior phylogeny.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | require(ape) require(MASS) # load the phylogenies trees1 <- read.nexus("macaques.nex1.t") trees2 <- read.nexus("macaques.nex2.t") K <- length(trees1)*2 # number of trees N <- 14 # number of species cov.mats <- array(dim=c(K, N, N)) inv.mat <- array(dim=c(K, N, N)) for (i in 1:(K/2)){ cov.mats[i, , ] <- vcv(trees1[[i]], corr=T) inv.mat[i,,] <- ginv(cov.mats[i,,]) cov.mats[(K/2+i), , ] <- vcv(trees2[[i]], corr=T) inv.mat[(K/2+i), , ] <- ginv(cov.mats[(K/2+i), , ]) } |
Then we can fit the model with OpenBUGS, estimating the missing body mass of M. siberu.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | # load body mass data d <- read.csv("macaques.csv") # specify model cat(" model{ alpha ~ dnorm(3, .01) for (i in 1:nspec){ mu[i] <- alpha } weight[1:nspec] ~ dmnorm(mu[], TAU[, ]) sigma ~ dunif(0, 10) tau <- 1 / (sigma * sigma) lambda ~ dunif(0, 1) for (k in 1:ntree){ p[k] <- 1/ntree } K ~ dcat(p[]) for (i in 1:nspec){ for (j in 1:nspec){ LAMBDA[i, j] <- 1 + (lambda - 1) * (1 - equals(i, j)) TAU[i, j] <- tau * LAMBDA[i, j] * inv.mat[K, i, j] } } } ", file="phylomod.txt") d bugsd <- list(weight = d$lmass, nspec=nrow(d), ntree = K, inv.mat=inv.mat) # run model out <- bugs(bugsd, inits = NULL, parameters.to.save = c("sigma", "alpha", "K", "weight", "TAU", "lambda"), n.iter=3000, model.file="phylomod.txt", DIC=F, n.thin=5, n.chains=3) out coda.out <- as.mcmc.bugs(out) ggd <- ggs(coda.out) ggs_density(ggd, "weight") + theme_bw() + ggtitle(expression(paste("Predicted body size of ", italic("Macaca siberu")))) + theme(axis.title.y = element_blank()) + xlab("Log(body mass in kg)") |
Here is our phylogenetically informed estimate of the body size for M. siberu. Pagel’s $\lambda$ indicates a weak but non-zero phylogenetic signal with mean = 0.417 and 95% BCI = (0.0156, 0.949). It should go without saying this is a toy example, and it may be better to go out and weigh some actual Siberut macaques (at a minimum, this would be a good excuse for a vacation).
References & further reading
-
Blomberg et al.): Independent contrasts and PGLS regression estimators are equivalent. Systematic Biology 2012.
-
Pagel: Inferring the historical patterns of biological evolution. Nature 1999.
-
Perelman et al.: A molecular phylogeny of living primates. PLoS Genetics 2011.
-
Smith et al.: Body mass of late quaternary mammals. Ecology 2003.
-
de Villemereuil et al.: Bayesian models for comparative analysis integrating phylogenetic uncertainty. BMC Evolutionary Biology 2012
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.