use clusterProfiler as an universal enrichment analysis tool
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
clusterProfiler supports enrichment analysis of both hypergeometric test and gene set enrichment analysis. It internally support Gene Ontology analysis of about 20 species, Kyoto Encyclopedia of Genes and Genomes (KEGG) with all species that have annotation available in KEGG database, DAVID annotation, Disease Ontology and Network of Cancer Genes (via DOSE for human) and Reactome Pathway (via ReactomePA for several species). This is still not enough for users may want to analyze their data with unsupported organisms, slim version of GO, novel functional annotation (eg GO via blastgo and KEGG via KAAS), unsupported ontology/pathway or customized annotation.
clusterProfiler provides enricher function for hypergeometric test and GSEA function for gene set enrichment analysis that are designed to accept user defined annotation. They accept two additional parameters TERM2GENE and TERM2NAME. As indicated in the parameter names, TERM2GENE is a data.frame with first column of term ID and second column of corresponding mapped gene and TERM2NAME is a data.frame with first column of term ID and second column of corresponding term name. TERM2NAME is optional.
Here I used DisGeNET which contains 381056 associations, between 16666 genes and 13172 diseases, as an example to demonstrate the usage of enricher and GSEA functions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | > require(DOSE) > require(clusterProfiler) > data(geneList) > deg <- names(geneList)[abs(geneList)>2] ## downloaded from http://www.disgenet.org/ds/DisGeNET/results/all_gene_disease_associations.tar.gz > gda <- read.delim("all_gene_disease_associations.txt") > dim(gda) [1] 415681 9 > head(gda) geneId geneSymbol geneName diseaseId 1 540 ATP7B ATPase, Cu++ transporting, beta polypeptide umls:C0019202 2 540 ATP7B ATPase, Cu++ transporting, beta polypeptide umls:C0019202 3 540 ATP7B ATPase, Cu++ transporting, beta polypeptide umls:C0019202 4 4160 MC4R melanocortin 4 receptor umls:C0028754 5 4160 MC4R melanocortin 4 receptor umls:C0028754 6 3667 IRS1 insulin receptor substrate 1 umls:C0011860 diseaseName score NumberOfPubmeds associationType 1 Hepatolenticular Degeneration 0.9898764 229 GeneticVariation 2 Hepatolenticular Degeneration 0.9898764 229 Biomarker 3 Hepatolenticular Degeneration 0.9898764 229 AlteredExpression 4 Obesity 0.9400000 234 GeneticVariation 5 Obesity 0.9400000 234 Biomarker 6 Diabetes Mellitus, Type 2 0.9077659 106 GeneticVariation source 1 UNIPROT, CTD_human, MGD, RGD, GAD, LHGDN, BeFree 2 UNIPROT, CTD_human, MGD, RGD, GAD, LHGDN, BeFree 3 UNIPROT, CTD_human, MGD, RGD, GAD, LHGDN, BeFree 4 UNIPROT, CTD_human, MGD, RGD, GAD, BeFree 5 UNIPROT, CTD_human, MGD, RGD, GAD, BeFree 6 UNIPROT, CTD_human, MGD, RGD, GAD, BeFree > > disease2gene=gda[, c("diseaseId", "geneId")] > disease2name=gda[, c("diseaseId", "diseaseName")] > x = enricher(deg, TERM2GENE=disease2gene, TERM2NAME=disease2name) > head(summary(x)) ID Description GeneRatio BgRatio umls:C0006142 umls:C0006142 Malignant neoplasm breast 89/196 3324/16666 umls:C0006826 umls:C0006826 NEOPLASM MALIGNANT 113/196 5102/16666 umls:C0678222 umls:C0678222 Breast Carcinoma 83/196 3072/16666 umls:C1458156 umls:C1458156 Recurrent Malignant Neoplasm 47/196 1158/16666 umls:C1458155 umls:C1458155 Breast Neoplasms 60/196 1981/16666 umls:C0600139 umls:C0600139 prostate carcinoma 56/196 2033/16666 pvalue p.adjust qvalue umls:C0006142 4.934860e-16 9.845045e-13 8.249008e-13 umls:C0006826 3.071791e-15 3.064112e-12 2.567371e-12 umls:C0678222 5.894907e-15 3.920113e-12 3.284601e-12 umls:C1458156 3.230248e-14 1.611086e-11 1.349904e-11 umls:C1458155 1.724317e-12 6.880024e-10 5.764664e-10 umls:C0600139 5.025005e-10 1.670814e-07 1.399949e-07 geneID umls:C0006142 4312/10874/991/6280/2305/9493/1062/4605/9833/9133/6279/10403/8685/597/7153/6278/259266/3627/27074/6241/7368/11065/55355/9582/55872/51203/3669/22974/10460/10563/4751/820/27338/890/983/4085/6362/9837/5918/332/3832/6286/5163/2146/3002/50852/7272/2568/9212/51659/1111/9055/4321/3620/6790/891/4174/9232/9370/1602/4629/771/3117/80129/23158/125/1311/5174/4250/2167/652/4036/4137/8839/2066/4693/3158/3169/1408/9547/2922/10647/5304/8614/2625/7802/9/5241/10551 umls:C0006826 4312/10874/55388/991/6280/2305/1062/3868/4605/9833/6279/10403/597/7153/6278/79733/3627/27074/6241/55165/9787/7368/11065/9582/55872/51203/3669/83461/22974/10460/10563/4751/6373/8140/1844/4283/27299/27338/890/9415/983/10232/4085/6362/5080/5918/81620/332/3832/6286/5163/2146/3002/50852/7272/2568/2842/9212/8715/1111/9055/3833/4321/11182/10112/3902/3620/3887/51514/6790/4521/891/8544/24137/10578/9232/9370/1602/3708/9122/10699/4629/771/3117/23158/125/4958/1311/2018/1308/4250/652/80736/4036/8839/2066/4693/3169/1408/9547/2922/1524/1580/10647/5304/8614/2625/7802/11122/9/5241/10551/4969 umls:C0678222 4312/10874/6280/2305/4605/9833/9133/6279/10403/8685/597/7153/6278/259266/3627/27074/6241/55165/11065/55355/9582/55872/51203/3669/10563/4751/820/27338/890/983/4085/6362/9837/5918/332/6286/5163/2146/3002/50852/7272/2568/9212/1111/9055/4321/3620/6790/891/9232/9370/1602/3708/4629/771/3117/23158/125/1311/5174/2532/4250/2167/652/4036/4137/8839/2066/4693/3158/3169/9547/2922/1524/10647/5304/8614/2625/7021/7802/9/5241/10551 umls:C1458156 4312/991/6280/6279/8685/7153/6278/259266/3627/27074/6241/55165/9787/3669/22974/983/10232/4085/5080/332/2146/3002/50852/2568/9212/4321/3620/3887/6790/891/4174/9232/3708/4629/771/3117/23158/730/2018/4036/2066/9547/2625/9/5241/10551/57758 umls:C1458155 4312/10874/2305/4605/9833/9133/10403/597/7153/6278/259266/3627/11065/9582/3669/10563/8140/820/1844/27338/890/9415/983/81620/332/2146/3002/2568/9212/51659/1111/9055/11182/3620/51514/6790/4521/891/8544/9232/9370/1602/771/23158/125/4250/2167/652/8839/2066/3169/10647/5304/8614/2625/7021/9/5241/10551/57758 umls:C0600139 4312/6280/2305/4605/9833/6279/7153/3627/6241/55165/11065/22974/10563/8140/1844/890/983/5080/5918/6286/5163/2146/9212/1111/4321/3620/6790/891/8544/9232/9370/8857/1602/3708/23090/4629/3117/23158/2532/2167/652/80736/4036/3169/9547/2922/11283/1524/5304/8614/2625/79689/11122/9/5241/10551 Count umls:C0006142 89 umls:C0006826 113 umls:C0678222 83 umls:C1458156 47 umls:C1458155 60 umls:C0600139 56 > > barplot(x) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | > y = GSEA(geneList, TERM2GENE=disease2gene, TERM2NAME=disease2name) > head(summary(y)) ID Description setSize enrichmentScore umls:C0001418 umls:C0001418 Adenocarcinoma 1224 0.2431172 umls:C0001973 umls:C0001973 Alcoholism 548 -0.3213479 umls:C0002171 umls:C0002171 Alopecia Areata 74 0.5122605 umls:C0003864 umls:C0003864 Arthritis 447 0.3140411 umls:C0003872 umls:C0003872 Arthritis, Psoriatic 122 0.4311280 umls:C0003873 umls:C0003873 Arthritis, Rheumatoid 1186 0.2855594 pvalue p.adjust qvalues umls:C0001418 0 0 0 umls:C0001973 0 0 0 umls:C0002171 0 0 0 umls:C0003864 0 0 0 umls:C0003872 0 0 0 umls:C0003873 0 0 0 > gseaplot(y, "umls:C0003872") |
Related Posts
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.