NEWS of my BioC packages
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Today is my birthday and it happened to be the release day of Bioconductor 3.3. It’s again the time to reflect what I’ve done in the past year.
ChIPseeker
Although ChIPseeker was designed for ChIP-seq annotation, I am very glad to find that someone else use it to annotate other data including copy number variants and DNA breakpoints.
annotatePeak
Several parameters including sameStrand
,ignoreOverlap
, ignoreUpstream
and ignoreDownstream
were added in annotatePeak
requested by @crazyhottommy for using ChIPseeker to annotate breakpoints from whole genome sequencing data.
Another parameter overlap
was also introduced. By default overlap="TSS"
and only overlap with TSS will be reported as the nearest gene. If overlap="all"
, then gene overlap with peak will be reported as nearest gene, no matter the overlap is at TSS region or not.
Now annotatePeak
also support using user’s customize regions to annotate their data by passing TxDb=user_defined_GRanges
.
getBioRegion
getPromoters()
function prepare a GRanges
object of promoter regions by user specific upstream and downstream distance from Transcript Start Site (TSS). Then we can align the peaks that are mapping to these regions and visualize the profile or heatmap of ChIP binding to the TSS regions.
Users (1 and 2) are interesting in the intensity of peaks binding to the start of intron/exon, and ChIPseeker provides a new function getBioRegion
to output GRanges
object of Intron/Exon start regions.
visualization
upsetplot
was implemented to visualize ChIP annotation overlap.covplot
now support GRangesList.
GEO data mining
ChIPseeker incorporates GEO database and supports data mining to infer cooperative regulation. The data was updated and now ChIPseeker contains 19348 bed file information.
clusterProfiler
We compare clusterProfiler with GSEA-P (which released by broad institute), the p-values calculated by these two software is almost identical.
For comparing biological themes, clusterProfiler
supports formula to express complex conditions and facet is supported to visualize complex result.
read.gmt
function for parsing GMT file format from Molecular Signatures Database, so that gene set collections in this database can be used in clusterProfiler
for both hypergeometric test and GSEA.
KEGG Module was supported just like the KEGG Pathway, clusterProfiler
will query the online annotation data which keep the annotation data alwasy updated.
The KEGG database was updated quite frequently. The KEGG.db
which was not updated since 2012, it contains annotation of 5894 human genes. In Feb. 2015, when clusterProfiler
first supports querying online KEGG data, KEGG contains annotation of 6861 human genes and today it has 7018 human genes annotated. Most of the tools/webservers used out-dated data (e.g. DAVID not updated since 2010), the analyzed result may totally changed if we use a recently updated data. Indeed clusterProfiler
is more reliable as we always use the latest data.
In addition to bitr
function that can translate biological ID using OrgDb
object, we provides bitr_kegg that uses KEGG API for translating biological ID. It supports more than 4000 species (can be search via the search_kegg_species
function) as in KEGG Pathway and Module analysis.
The function called of enrichGO and gseGO was changed. Now not only species that have OrgDb available in Bioconductor can be analyzed but also all species that have an OrgDb can be analyzed which can be query online via AnnotationHub
or build with user’s own data. With this update, enrichGO
and gseGO
can input any gene ID type if only the ID type was supported in the OrgDb
.
GO enrichment analysis alwasy output redundant terms, we implemented a simplify
function to remove redundant terms by calculating GO semantic similarity using GOSemSim
. Several useful utilities include dropGO
, go2ont
, go2term
, gofilter
and gsfilter
are also provided.
I bump the version to 3.0.0 the following three reasons:
- the changes of function calls
- can analyze any ontology/pathway annotation
- can analyze all speices
Although the package is very simple when I published it, I keep update and add new features from my own idea or user’s request. Now this package is indeed in good shape. Here is the summary.
This package implements methods to analyze and visualize functional profiles of genomic coordinates (supported by ChIPseeker), gene and gene clusters.
clusterProfiler supports both hypergeometric test and Gene Set Enrichment Analysis for many ontologies/pathways, including:
- Disease Ontology (via DOSE)
- Network of Cancer Gene (via DOSE)
- Gene Ontology (supports many species with GO annotation query online via AnnotationHub)
- KEGG Pathway and Module with latest online data (supports more than 2000 species listed in http://www.genome.jp/kegg/catalog/org_list.html)
- Reactome Pathway (via ReactomePA)
- DAVID (via RDAVIDWebService)
- Molecular Signatures Database
- hallmark gene sets
- positional gene sets
- curated gene sets
- motif gene sets
- computational gene sets
- GO gene sets
- oncogenic signatures
- immunologic signatures
- Other Annotations
- from other sources (e.g. DisGeNET as an example)
- user’s annotation
- customized ontology
- and many others
clusterProfiler also provides several visualization methods to help interpreting enriched results, including:
- barplot
- cnetplot
- dotplot
- enrichMap
- gseaplot
- plotGOgraph (via topGO package)
- upsetplot
and several useful utilities:
- bitr (Biological Id TranslatoR)
- bitr_kegg (bitr using KEGG source)
- compareCluster (biological theme comparison)
- dropGO (screen out GO term of specific level or specific term)
- go2ont (convert GO ID to Ontology)
- go2term (convert GO ID to descriptive term)
- gofilter (restrict result at specific GO level)
- gsfilter (restrict result by gene set size)
- search_kegg_organism (search kegg supported organism)
- setReadable (convert IDs stored
enrichResult
object to gene symbol) - simplify (remove redundant GO terms, supported via GOSemSim)
DOSE
DOSE now test bimodal separately in GSEA and the output pvalues are more conserved.
maxGSSize
parameter was added, with default value of 500
. Usually if the geneset > 500, its probability of being called significant by GSEA rises quite dramatically.
gsfilter
function for restricting enriched results with minimal and maximal gene set sizes.
upsetplot
was implemented to visualize overlap of enriched gene sets.
The dot sizes in enrichMap
now scaled by category sizes
ggtree
I put more efforts to extend ggtree
than the sum of all other packages. Here listed the major new features while small improvement and bug fixed can be found in the NEWS file.
IO
- support NHX file format via
read.nhx
function - support phylip tree format via
read.phylip
function - raxml2nwk for converting raxml bootstrap tree to newick text
- all parser functions support passing
textConnection(text_string)
as a file - support
ape
bootstrap analysis - support annotating tree with ancestral sequences inferred by
phangorn
- support obkData object defined by OutbreakTools package
- support phyloseq object defined by phyloseq package
layers
- geom_point2,geom_text2, geom_segment2 and geom_label2 to support subsetting
- geom_treescale for adding scale of branch length
- geom_cladelabel for labeling selected clade
- geom_tiplab2 for adding tiplab of circular tree
- geom_taxalink for connecting related taxa
- geom_range for adding range to present uncertainty of branch lengths
- subview and inset now support annotating with image files
utilities
- rescale_tree function to rescale branch lengths using numerical variable
- MRCA for finding Most Recent Common Ancestor among a vector of tips
- viewClade to zoom in a selected clade
vignettes
Split the long vignette to several small ones and add more examples.
- ggtree
- Tree Data Import
- Tree Visualization
- Tree Annotation
- Tree Manipulation
- Advance Tree Annotation
Here is the NEWS record:
CHANGES IN VERSION 1.3.16 ------------------------ o geom_treescale() supports family argument <2016-04-27, Wed> + https://github.com/GuangchuangYu/ggtree/issues/56 o update fortify.phylo to work with phylo that has missing value of edge length <2016-04-21, Thu> + https://github.com/GuangchuangYu/ggtree/issues/54 o support passing textConnection(text_string) as a file <2016-04-21, Thu> + contributed by Casey Dunn <[email protected]> + https://github.com/GuangchuangYu/ggtree/pull/55#issuecomment-212859693 CHANGES IN VERSION 1.3.15 ------------------------ o geom_tiplab2 supports parameter hjust <2016-04-18, Mon> o geom_tiplab and geom_tiplab2 support using geom_label2 by passing geom="label" <2016-04-07, Thu> o geom_label2 that support subsetting <2016-04-07, Thu> o geom_tiplab2 for adding tip label of circular layout <2016-04-06, Wed> o use plot$plot_env to access ggplot2 parameter <2016-04-06, Wed> o geom_taxalink for connecting related taxa <2016-04-01, Fri> o geom_range for adding range of HPD to present uncertainty of evolutionary inference <2016-04-01, Fri> CHANGES IN VERSION 1.3.14 ------------------------ o geom_tiplab works with NA values, compatible with collapse <2016-03-05, Sat> o update theme_tree2 due to the issue of https://github.com/hadley/ggplot2/issues/1567 <2016-03-05, Sat> o offset works in `align=FFALSE` with `annotation_image` function <2016-02-23, Tue> + see https://github.com/GuangchuangYu/ggtree/issues/46 o subview and inset now supports annotating with img files <2016-02-23, Tue> CHANGES IN VERSION 1.3.13 ------------------------ o add example of rescale_tree function in treeAnnotation.Rmd <2016-02-07, Sun> o geom_cladelabel works with collapse <2016-02-07, Sun> + see https://github.com/GuangchuangYu/ggtree/issues/38 CHANGES IN VERSION 1.3.12 ------------------------ o exchange function name of geom_tree and geom_tree2 <2016-01-25, Mon> o solved issues of geom_tree2 <2016-01-25, Mon> + https://github.com/hadley/ggplot2/issues/1512 o colnames_level parameter in gheatmap <2016-01-25, Mon> o raxml2nwk function for converting raxml bootstrap tree to newick format <2016-01-25, Mon> CHANGES IN VERSION 1.3.11 ------------------------ o solved issues of geom_tree2 <2016-01-25, Mon> + https://github.com/GuangchuangYu/ggtree/issues/36 o change compute_group() to compute_panel in geom_tree2() <2016-01-21, Thu> + fixed issue, https://github.com/GuangchuangYu/ggtree/issues/36 o support phyloseq object <2016-01-21, Thu> o update geom_point2, geom_text2 and geom_segment2 to support setup_tree_data <2016-01-21, Thu> o implement geom_tree2 layer that support duplicated node records via the setup_tree_data function <2016-01-21, Thu> o rescale_tree function for rescaling branch length of tree object <2016-01-20, Wed> o upgrade set_branch_length, now branch can be rescaled using feature in extraInfo slot <2016-01-20, Wed> CHANGES IN VERSION 1.3.10 ------------------------ o remove dependency of gridExtra by implementing multiplot function instead of using grid.arrange <2016-01-20, Wed> o remove dependency of colorspace <2016-01-20, Wed> o support phylip tree format and update vignette of phylip example <2016-01-15, Fri> CHANGES IN VERSION 1.3.9 ------------------------ o optimize getYcoord <2016-01-14, Thu> o add 'multiPhylo' example in 'Tree Visualization' vignette <2016-01-13, Wed> o viewClade, scaleClade, collapse, expand, rotate, flip, get_taxa_name and scale_x_ggtree accepts input tree_view=NULL. these function will access the last plot if tree_view=NULL. <2016-01-13, Wed> + > ggtree(rtree(30)); viewClade(node=35) works. no need to pipe. CHANGES IN VERSION 1.3.8 ------------------------ o add example of viewClade in 'Tree Manipulation' vignette <2016-01-13, Wed> o add viewClade function <2016-01-12, Tue> o support obkData object defined by OutbreakTools <2016-01-12, Tue> o update vignettes <2016-01-07, Thu> o 05 advance tree annotation vignette <2016-01-04, Mon> o export theme_inset <2016-01-04, Mon> o inset, nodebar, nodepie functions <2015-12-31, Thu> CHANGES IN VERSION 1.3.7 ------------------------ o split the long vignette to several vignettes + 00 ggtree <2015-12-29, Tue> + 01 tree data import <2015-12-28, Mon> + 02 tree visualization <2015-12-28, Mon> + 03 tree manipulation <2015-12-28, Mon> + 04 tree annotation <2015-12-29, Tue> CHANGES IN VERSION 1.3.6 ------------------------ o MRCA function for finding Most Recent Common Ancestor among a vector of tips <2015-12-22, Tue> o geom_cladelabel: add bar and label to annotate a clade <2015-12-21, Mon> - remove annotation_clade and annotation_clade2 functions. o geom_treescale: tree scale layer. (add_legend was removed) <2015-12-21, Mon> CHANGES IN VERSION 1.3.5 ------------------------ o bug fixed, read.nhx now works with scientific notation <2015-11-30, Mon> + see https://github.com/GuangchuangYu/ggtree/issues/30 CHANGES IN VERSION 1.3.4 ------------------------ o rename beast feature when name conflict with reserve keywords (label, branch, etc) <2015-11-27, Fri> o get_clade_position function <2015-11-26, Thu> + https://github.com/GuangchuangYu/ggtree/issues/28 o get_heatmap_column_position function <2015-11-25, Wed> + see https://github.com/GuangchuangYu/ggtree/issues/26 o support NHX (New Hampshire X) format via read.nhx function <2015-11-17, Tue> o bug fixed in extract.treeinfo.jplace <2015-11-17, Thu> CHANGES IN VERSION 1.3.3 ------------------------ o support color=NULL in gheatmap, then no colored line will draw within the heatmap <2015-10-30, Fri> o add `angle` for also rectangular, so that it will be available for layout='rectangular' following by coord_polar() <2015-10-27, Tue> CHANGES IN VERSION 1.3.2 ------------------------ o update vignette, add example of ape bootstrap and phangorn ancestral sequences <2015-10-26, Mon> o add support of ape bootstrap analysis <2015-10-26, Mon> see https://github.com/GuangchuangYu/ggtree/issues/20 o add support of ancestral sequences inferred by phangorn <2015-10-26, Mon> see https://github.com/GuangchuangYu/ggtree/issues/21 CHANGES IN VERSION 1.3.1 ------------------------ o change angle to angle + 90, so that label will in radial direction <2015-10-22, Thu> + see https://github.com/GuangchuangYu/ggtree/issues/17 o na.rm should be always passed to layer(), fixed it in geom_hilight and geom_text2 <2015-10-21, Wed> + see https://github.com/hadley/ggplot2/issues/1380 o matching beast stats with tree using internal node number instead of label <2015-10-20, Tue>
GOSemSim
update IC data using update OrgDb packages.
ReactomePA
Internal implementation was updated according to the change of DOSE
.
We published ReactomePA in Molecular BioSystems.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.