using meshes for MeSH Enrichment Analysis
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
MeSH (Medical Subject Headings) is the NLM (U.S. National Library of
Medicine) controlled vocabulary used to manually index articles for
MEDLINE/PubMed. MeSH is comprehensive life science vocabulary. MeSH has
19 categories and MeSH.db contains 16 of them. That is:
| Abbreviation | Category |
|---|---|
| A | Anatomy |
| B | Organisms |
| C | Diseases |
| D | Chemicals and Drugs |
| E | Analytical, Diagnostic and Therapeutic Techniques and Equipment |
| F | Psychiatry and Psychology |
| G | Phenomena and Processes |
| H | Disciplines and Occupations |
| I | Anthropology, Education, Sociology and Social Phenomena |
| J | Technology and Food and Beverages |
| K | Humanities |
| L | Information Science |
| M | Persons |
| N | Health Care |
| V | Publication Type |
| Z | Geographical Locations |
MeSH terms were associated with Entrez Gene ID by three methods,
gendoo, gene2pubmed and RBBH (Reciprocal Blast Best Hit).
| Method | Way of corresponding Entrez Gene IDs and MeSH IDs |
|---|---|
| Gendoo | Text-mining |
| gene2pubmed | Manual curation by NCBI teams |
| RBBH | sequence homology with BLASTP search (E-value<10-50) |
meshes
supports enrichment analysis (over-representation analysis and gene set
enrichment analysis) of gene list or whole expression profile using MeSH
annotation. Data source from gendoo, gene2pubmed and RBBH are all
supported. User can selecte interesting category to test. All 16
categories are supported. The analysis supports >70 species listed in MeSHDb BiocView.
library(meshes) data(geneList) de = names(geneList)[1:100] x <- enrichMeSH(de, MeSHDb = "MeSH.Hsa.eg.db", database='gendoo', category = 'C') head(x) ## ID Description GeneRatio BgRatio pvalue ## D043171 D043171 Chromosomal Instability 16/96 198/16528 2.794765e-14 ## D000782 D000782 Aneuploidy 17/96 320/16528 3.866830e-12 ## D042822 D042822 Genomic Instability 16/96 312/16528 3.007419e-11 ## D012595 D012595 Scleroderma, Systemic 11/96 279/16528 6.449334e-07 ## D009303 D009303 Nasopharyngeal Neoplasms 11/96 314/16528 2.049315e-06 ## D019698 D019698 Hepatitis C, Chronic 11/96 317/16528 2.246856e-06 ## p.adjust qvalue ## D043171 2.434241e-11 1.794534e-11 ## D000782 1.684004e-09 1.241456e-09 ## D042822 8.731539e-09 6.436931e-09 ## D012595 1.404343e-04 1.035288e-04 ## D009303 3.261686e-04 2.404530e-04 ## D019698 3.261686e-04 2.404530e-04 ## geneID ## D043171 4312/991/2305/1062/4605/10403/7153/55355/4751/4085/81620/332/7272/9212/1111/6790 ## D000782 4312/55143/991/1062/7153/4751/79019/55839/890/983/4085/332/7272/9212/8208/1111/6790 ## D042822 55143/991/1062/4605/7153/1381/9787/4751/10635/890/4085/81620/332/9212/1111/6790 ## D012595 4312/6280/1062/4605/7153/3627/4283/6362/7850/3002/4321 ## D009303 4312/7153/3627/6241/983/4085/5918/332/3002/4321/6790 ## D019698 4312/3627/10563/6373/4283/983/6362/7850/332/3002/3620 ## Count ## D043171 16 ## D000782 17 ## D042822 16 ## D012595 11 ## D009303 11 ## D019698 11
In the over-representation analysis, we use data source from gendoo and C (Diseases) category.
In the following example, we use data source from gene2pubmed and test category G (Phenomena and Processes) using GSEA.
y <- gseMeSH(geneList, MeSHDb = "MeSH.Hsa.eg.db", database = 'gene2pubmed', category = "G") ## [1] "preparing geneSet collections..." ## [1] "GSEA analysis..." ## [1] "leading edge analysis..." ## [1] "done..." head(y) ## ID Description setSize enrichmentScore ## D009929 D009929 Organ Size 449 -0.3458797 ## D059647 D059647 Gene-Environment Interaction 455 -0.3551242 ## D009043 D009043 Motor Activity 398 -0.3391521 ## D050156 D050156 Adipogenesis 368 -0.3618413 ## D004041 D004041 Dietary Fats 314 -0.3427588 ## D006339 D006339 Heart Rate 312 -0.3695689 ## NES pvalue p.adjust qvalues rank ## D009929 -1.524164 0.001248439 0.03715088 0.02756207 2309 ## D059647 -1.564984 0.001251564 0.03715088 0.02756207 2237 ## D009043 -1.483672 0.001256281 0.03715088 0.02756207 1757 ## D050156 -1.577000 0.001256281 0.03715088 0.02756207 2207 ## D004041 -1.473730 0.001269036 0.03715088 0.02756207 1684 ## D006339 -1.588315 0.001270648 0.03715088 0.02756207 2405 ## leading_edge ## D009929 tags=27%, list=18%, signal=22% ## D059647 tags=26%, list=18%, signal=22% ## D009043 tags=21%, list=14%, signal=18% ## D050156 tags=26%, list=18%, signal=22% ## D004041 tags=21%, list=13%, signal=19% ## D006339 tags=29%, list=19%, signal=24% ## core_enrichment ## D009929 154/9846/3315/6716/9732/5139/7337/5530/4086/6532/1499/7157/627/2252/22891/2908/8654/4088/22846/4057/860/268/2735/2104/23522/5480/51131/3082/10253/831/604/1028/182/7173/5624/8743/23047/596/9905/1548/2272/22829/948/27303/4314/196/6019/595/5021/7248/4212/2488/54820/5334/6403/2246/4803/866/5919/79789/1907/7048/1831/4060/2247/5468/8076/5793/3485/1733/3952/126/3778/79068/79633/6653/5244/4313/3625/10468/9201/1501/6720/2273/2099/3480/5764/6387/1471/1462/4016/2690/8817/8821/5125/1191/5350/2162/5744/23541/185/367/4982/25802/4128/150/3479/10451/9370/125/4857/1308/2167/652/57502/4137/8614/5241 ## D059647 9497/118/8859/6532/23405/7424/2295/7157/8631/627/2774/22891/2908/4088/51151/11132/1387/860/268/7366/2104/4153/29119/3791/1543/3643/22841/1129/5624/3240/3174/3350/5590/55304/55213/1548/2169/196/8204/8863/5021/23284/9162/11005/4256/3426/84159/5334/629/1793/4208/4322/7048/6817/553/56172/3953/22795/2638/210/5243/5468/1393/1012/27136/51314/4023/5172/4319/4214/3952/5577/126/7832/79068/4313/2944/9369/3075/6720/7494/2099/857/57161/9223/4306/79750/4035/4915/10443/5744/5654/100126791/3551/2487/1746/185/2952/6935/4128/4059/4582/27324/9358/64084/7166/6505/9370/3708/3117/80129/125/5105/2018/2167/652/4137/1524/5241 ## D009043 23621/3082/1291/2915/1543/7466/3240/3350/55304/181/2169/27306/80169/9627/196/8678/8863/23284/81627/4692/5799/2259/3087/1278/1277/3953/4747/2247/6414/210/4744/5468/89795/4023/8522/3485/3952/79068/8864/4313/2944/2273/2099/3480/8528/4908/56892/3339/57161/4741/4306/6571/79750/4915/5744/2487/58503/347/6863/2952/5327/367/4982/4128/4059/3572/150/7060/9358/7166/3479/9254/5348/4129/9370/3708/1311/5105/4137/1408/5241 ## D050156 5595/8609/9563/27332/1499/79738/4837/7157/79960/5729/408/2908/4088/6500/8038/4057/6649/5564/860/8648/10365/10253/54884/4602/7474/6776/79875/596/25956/8644/80781/79923/1490/50486/7840/84162/6041/4692/2246/4208/11075/63924/5919/284119/2308/9411/54795/5950/79365/2247/5468/50507/6469/8553/4023/594/7350/81029/3952/79068/5733/4313/10468/10628/6720/11213/55893/290/6678/63895/4035/633/23414/8639/2162/165/3551/10788/185/3357/367/4982/3667/1634/4128/23024/3479/6424/9370/2167/652/8839/54829/2625/79689/10974 ## D004041 3554/4925/22841/7466/2181/3350/201134/181/2169/948/55911/324/4018/3426/3087/6785/2308/1581/56172/3953/1384/5950/2166/60481/5468/5166/50507/1012/27136/4023/7056/4214/9365/7350/3952/3778/79068/8864/2944/6720/5159/3991/2203/2819/9223/4035/32/213/165/347/2152/185/3487/5327/3667/54898/150/64084/3479/9370/5105/5174/2018/5346/7021/79689 ## D006339 4985/7139/8929/3784/3375/154/1760/9781/5139/118/2702/6532/6416/2869/270/7157/627/2908/7138/5563/3643/1129/7779/947/2034/4179/64388/1621/4881/8863/5021/844/4212/11030/5797/6403/4803/84059/79789/5176/3953/5243/5468/1012/2868/5793/4023/7056/3952/5577/126/2946/3778/477/5733/4313/2944/9201/3075/9499/2273/2099/1471/857/775/4306/4487/213/5350/5744/23245/2152/2697/2791/185/6863/2952/5327/80206/9607/3572/150/3479/2006/55259/9370/125/652/55351
Users can use visualization methods implemented in DOSE (i.e. barplot, dotplot, cnetplot, enrichMap, upsetplot, gseaplot) to help interpreting enriched results.
gseaplot(y, y[1,1], title=y[1,2])

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.