align genomic features with phylogenetic tree
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A question on biostars asking how to generate the following figure:
This can be quite easy to implement in ggtree, I can write a geom
layer to layout the alignment. As ggbio already provides many geom
for genomic data and I don’t want to re-invent the wheel, I decided to try ggtree+ggbio
. This is also the beauty of R
that packages complete each others.
Here is a sample data presented in ggbio
vignette:
library(ggbio) library(biovizBase) library(Homo.sapiens) library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene data(genesymbol, package = "biovizBase") wh <- genesymbol[c("BRCA1", "NBR1")] wh <- range(wh, ignore.strand = TRUE) gr.txdb <- crunch(txdb, which = wh) ## change column to model colnames(values(gr.txdb))[4] <- "model" grl <- split(gr.txdb, gr.txdb$tx_id) set.seed(2016-10-25) names(grl) <- sample(LETTERS, size = length(grl), replace = TRUE) > print(grl) GRangesList object of length 32: $J GRanges object with 7 ranges and 4 metadata columns: seqnames ranges strand | tx_id tx_name <Rle> <IRanges> <Rle> | <factor> <factor> [1] chr17 [41277600, 41277787] + | 61241 uc002idf.3 [2] chr17 [41283225, 41283287] + | 61241 uc002idf.3 [3] chr17 [41284973, 41285154] + | 61241 uc002idf.3 [4] chr17 [41290674, 41292342] + | 61241 uc002idf.3 [5] chr17 [41277788, 41283224] * | 61241 uc002idf.3 [6] chr17 [41283288, 41284972] * | 61241 uc002idf.3 [7] chr17 [41285155, 41290673] * | 61241 uc002idf.3 gene_id model <factor> <factor> [1] 10230 exon [2] 10230 exon [3] 10230 exon [4] 10230 exon [5] 10230 gap [6] 10230 gap [7] 10230 gap $U GRanges object with 3 ranges and 4 metadata columns: seqnames ranges strand | tx_id tx_name [1] chr17 [41277600, 41277787] + | 61242 uc010czb.2 [2] chr17 [41290674, 41292342] + | 61242 uc010czb.2 [3] chr17 [41277788, 41290673] * | 61242 uc010czb.2 gene_id model [1] 10230 exon [2] 10230 exon [3] 10230 gap $D GRanges object with 9 ranges and 4 metadata columns: seqnames ranges strand | tx_id tx_name [1] chr17 [41277600, 41277787] + | 61243 uc002idg.3 [2] chr17 [41283225, 41283287] + | 61243 uc002idg.3 [3] chr17 [41290674, 41290939] + | 61243 uc002idg.3 [4] chr17 [41291833, 41292300] + | 61243 uc002idg.3 [5] chr17 [41296745, 41297125] + | 61243 uc002idg.3 [6] chr17 [41277788, 41283224] * | 61243 uc002idg.3 [7] chr17 [41283288, 41290673] * | 61243 uc002idg.3 [8] chr17 [41290940, 41291832] * | 61243 uc002idg.3 [9] chr17 [41292301, 41296744] * | 61243 uc002idg.3 gene_id model [1] 10230 exon [2] 10230 exon [3] 10230 exon [4] 10230 exon [5] 10230 exon [6] 10230 gap [7] 10230 gap [8] 10230 gap [9] 10230 gap ... <29 more elements> ------- seqinfo: 1 sequence from hg19 genome
We can visualize the alignment simply using:
ggplot() + geom_alignment(grl, alpha=.5)
The input data for geom_alignment
is a GRangesList
object, while facet_plot defined in ggtree
expect the input data as a data.frame
. I extend the facet_plot
to work with geom_alignment
. In doing this, I find a bug of the geom_alignment
function and send a patch to Michael. My patch was incorporated in ggbio 1.23.2
.
With the updates of both ggtree
and ggbio
, we can use facet_plot
to align alignment with phylogenetic tree.
Suppose we have the following tree:
library(ggtree) n <- names(grl) %>% unique %>% length set.seed(2016-10-25) tr = rtree(n) set.seed(2016-10-25) tr$tip.label = sample(unique(names(grl)), n) p <- ggtree(tr) + geom_tiplab()
It is quite easy to use facet_plot
to visualize the above alignment with this tree:
facet_plot(p, 'alignment', grl, geom_alignment, inherit.aes=FALSE, mapping=aes())
Beware of mapping=aes()
is required as ggbio
can’t accept mapping=NULL
.
PS: I only test geom_alignment
which works with GRanges
and GRangesList
. Other geoms
defined in ggbio
may not be supported. If you find facet_plot
fail to work with other geoms
, please open an issue in github. Feature requests are welcome.
Citation
G Yu, DK Smith, H Zhu, Y Guan, TTY Lam*. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution. doi:10.1111/2041-210X.12628
.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.