-omics in 2013
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Just how many (bad) -omics are there anyway? Let’s find out.
1. Get the raw data
It would be nice if we could search PubMed for titles containing all -omics:
*omics[TITL]
However, we cannot since leading wildcards don’t work in PubMed search. So let’s just grab all articles from 2013:
2013[PDAT]
and save them in a format which includes titles. I went with “Send to…File”, “Format…CSV”, which returns 575 068 records in pubmed_result.csv, around 227 MB in size.
2. Extract the -omics
Titles are in column 1 and we only want the -omics, so:
cut -f1 -d "," pubmed_result.csv | grep -i omics > omics.txt wc -l omics.txt # 1770 omics.txt
3. Clean, rinse, repeat…
We want just a list of -omics words. Time to break out the R. After much trial and error, I ended up with this. Ugly and far from optimized, but it (mostly) works. I say mostly, because I know of at least one case which is not detected: stain-omics.
library(stringr) omics <- readLines("omics.txt") omics <- strsplit(omics, " ") # split titles on space omics <- unlist(omics) # convert to vector of words omics <- omics[grep("omics", omics)] # just the -omics words omics <- gsub("[\"\'\\.:\\?\\[\\]]", "", omics, perl = T) # remove symbols, punctuation omics <- tolower(omics) m <- data.frame(a = omics, b = str_match(omics, "^(.*?omics)-")[, 2]) # matches e.g. "genomics-based" omics <- ifelse(is.na(m$b), as.character(m$a), as.character(m$b)) m <- data.frame(a = omics, b = str_match(omics, "-{1,}(.*?omics)$")[, 2]) # matches e.g. "phospho-proteomics" omics <- ifelse(is.na(m$b), as.character(m$a), as.character(m$b)) omics <- unlist(strsplit(omics, "\\/")) # split e.g. "genomics/proteomics" omics <- omics[grep("omics", omics)] # just the -omics words again # OK we're down to the edge cases now :) omics <- gsub("applications", "", omics) omics <- gsub("\\(meta\\)", "meta", omics)
4. Visualize
The top 20 -omics in 2013 and the less popular:
omics.freq <- as.data.frame(table(omics)) omics.freq <- omics.freq[ order(omics.freq$Freq, decreasing = T),] ggplot(head(omics.freq, 20)) + geom_bar(aes(omics, Freq), stat = "identity", fill = "darkblue") + coord_flip() + theme_bw() # and the less popular subset(omics.freq, Freq == 1)
On the right, the top 20. Click for a larger version of the graphic. Top of the list so far for 2013 is proteomics, followed by genomics and metabolomics.
Listed below, those -omics found only once in titles from 2013. Some shockers, I think you’ll agree (paging Jonathan Eisen). omics Freq aquaphotomics 1 biointeractomics 1 calciomics 1 cholanomics 1 cytogenomics 1 cytokinomics 1 econogenomics 1 glcnacomics 1 glycosaminoglycanomics 1 interactomics 1 ionomics 1 macroeconomics 1 materiomics 1 metalloproteomics 1 metaomics 1 metaproteogenomics 1 microbiomics 1 microeconomics 1 microgenomics 1 microproteomics 1 miromics 1 mitoproteomics 1 mobilomics 1 morphomics 1 museomics 1 neuromics 1 neuropeptidomics 1 nitroproteomics 1 nutrimetabonomics 1 oncogenomics 1 orthoproteomics 1 pangenomics 1 petroleomics 1 pharmacometabolomics 1 pharmacoproteomics 1 phylotranscriptomics 1 phytomics 1 postgenomics 1 pyteomics 1 radiogenomics 1 rehabilomics 1 retrophylogenomics 1 romics 1 secretomics 1 sensomics 1 speleogenomics 1 surfaceomics 1 surfomics 1 toxicometabolomics 1 vaccinomics 1 variomics 1 |
Never heard of romics? That’s OK. It’s a surname.
Filed under: bioinformatics, publications, R, statistics Tagged: omics, pubmed
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.