Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Annotating sets of genomic intervals with genomic annotations such as chromHMM
Genomation is an R package to summarize, annotate and visualize genomic intervals. It contains a collection of tools for visualizing and analyzing genome-wide data sets, i.e. RNA-seq, bisulfite sequencing or chromatin-immunoprecipitation followed by sequencing (ChIP-seq) data.
Recently we added new features to genomation The new functionalities are available in the latest version of genomation that can be found on its github website.
This demo shows the new annotation functions in genomation. The functions can be used to annotate target regions or a list of target regions with a given set of genomic features. The genomic features to be used should be in named GRangesList format.
Get data to R
We will get p300, SP1 and Nanog peaks,and chromHMM annotations from ENCODE. We wil use the H1-Esc cells. The aim is to annotate the peaks with chromHMM annotations.
The new functions
here are the new functions annotateWithFeatures
and heatTargetAnnotation
. These functions are available as of version 1.5.6. There were similar functions within the package before but these are more generalized versions of the old ones.
annotateWithFeatures()
will calculate the percentage of overlaps in a given GRanges object with a GRangesList object where each element correspods to a different category of genomic features such as promoters, exons and introns.
heatTargetAnnotation()
will plot a heatmap of annotations returned from annotateWithFeatures()
or can return the matrix that is used the create the heatmap.
Annotate peaks with chromHMM segments
We can annote a given peak set like this:
## summary of target set annotation with feature annotation: ## Rows in target set: 8934 ## ---------------------------- ## percentage of target elements overlapping with features: ## 1_Active_Promoter 10_Txn_Elongation 11_Weak_Txn 12_Repressed ## 28.16 0.07 2.10 1.33 ## 13_Heterochrom/lo 14_Repetitive/CNV 15_Repetitive/CNV 2_Weak_Promoter ## 2.74 0.08 0.18 23.86 ## 3_Poised_Promoter 4_Strong_Enhancer 5_Strong_Enhancer 6_Weak_Enhancer ## 8.69 11.91 16.33 32.65 ## 7_Weak_Enhancer 8_Insulator 9_Txn_Transition ## 8.03 5.72 1.00 ## ## percentage of feature elements overlapping with target: ## 1_Active_Promoter 10_Txn_Elongation 11_Weak_Txn 12_Repressed ## 19.81 0.04 0.16 0.68 ## 13_Heterochrom/lo 14_Repetitive/CNV 15_Repetitive/CNV 2_Weak_Promoter ## 0.27 0.19 0.57 6.76 ## 3_Poised_Promoter 4_Strong_Enhancer 5_Strong_Enhancer 6_Weak_Enhancer ## 6.45 20.36 11.49 3.46 ## 7_Weak_Enhancer 8_Insulator 9_Txn_Transition ## 0.53 0.84 0.71 ##
We can annotate the GRangesList of different peak sets like this:
## Working on: p300 ## Working on: SP1 ## Working on: NANOG
Make a heatmap of percentage of peaks overlapping with different chromHMM annotations
We can make a heatmap using heatTargetAnnotation()
function. The function also returns a ggplot2 object, so it can be further manipulated via ggplot2.
We can also get the matrix that is used to make the heatmap and use it in other heatmap functions.
sessionInfo()
sessionInfo() ## R version 3.3.0 (2016-05-03) ## Platform: x86_64-apple-darwin13.4.0 (64-bit) ## Running under: OS X 10.11.5 (El Capitan) ## ## locale: ## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 ## ## attached base packages: ## [1] grid stats graphics grDevices utils datasets methods ## [8] base ## ## other attached packages: ## [1] devtools_1.11.1 gplots_3.0.1 genomation_1.5.6 ## ## loaded via a namespace (and not attached): ## [1] Rcpp_0.12.6 formatR_1.4 ## [3] GenomeInfoDb_1.8.3 plyr_1.8.4 ## [5] XVector_0.12.1 bitops_1.0-6 ## [7] tools_3.3.0 zlibbioc_1.18.0 ## [9] digest_0.6.10 memoise_1.0.0 ## [11] tibble_1.1 evaluate_0.9 ## [13] gtable_0.2.0 BSgenome_1.40.1 ## [15] gridBase_0.4-7 yaml_2.1.13 ## [17] parallel_3.3.0 withr_1.0.1 ## [19] rtracklayer_1.32.2 stringr_1.0.0 ## [21] knitr_1.13.1 caTools_1.17.1 ## [23] gtools_3.5.0 Biostrings_2.40.2 ## [25] S4Vectors_0.10.2 IRanges_2.6.1 ## [27] stats4_3.3.0 Biobase_2.32.0 ## [29] data.table_1.9.6 impute_1.46.0 ## [31] plotrix_3.6-3 XML_3.98-1.4 ## [33] BiocParallel_1.6.5 seqPattern_1.4.0 ## [35] rmarkdown_0.9.6 gdata_2.17.0 ## [37] reshape2_1.4.1 readr_1.0.0 ## [39] ggplot2_2.1.0 magrittr_1.5 ## [41] codetools_0.2-14 matrixStats_0.50.2 ## [43] Rsamtools_1.24.0 scales_0.4.0 ## [45] htmltools_0.3.5 BiocGenerics_0.18.0 ## [47] GenomicRanges_1.24.2 GenomicAlignments_1.8.4 ## [49] assertthat_0.1 SummarizedExperiment_1.2.3 ## [51] colorspace_1.2-6 labeling_0.3 ## [53] KernSmooth_2.23-15 stringi_1.1.1 ## [55] RCurl_1.95-4.8 munsell_0.4.3 ## [57] chron_2.3-47
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.