-omics in 2013

[This article was first published on What You're Doing Is Rather Desperate » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Just how many (bad) -omics are there anyway? Let’s find out.

1. Get the raw data

It would be nice if we could search PubMed for titles containing all -omics:

*omics[TITL]

However, we cannot since leading wildcards don’t work in PubMed search. So let’s just grab all articles from 2013:

2013[PDAT]

and save them in a format which includes titles. I went with “Send to…File”, “Format…CSV”, which returns 575 068 records in pubmed_result.csv, around 227 MB in size.

2. Extract the -omics
Titles are in column 1 and we only want the -omics, so:

cut -f1 -d "," pubmed_result.csv | grep -i omics > omics.txt
wc -l omics.txt
# 1770 omics.txt

3. Clean, rinse, repeat…
We want just a list of -omics words. Time to break out the R. After much trial and error, I ended up with this. Ugly and far from optimized, but it (mostly) works. I say mostly, because I know of at least one case which is not detected: stain-omics.

library(stringr)

omics <- readLines("omics.txt")
omics <- strsplit(omics, " ")            # split titles on space
omics <- unlist(omics)                   # convert to vector of words
omics <- omics[grep("omics", omics)]     # just the -omics words
omics <- gsub("[\"\'\\.:\\?\\[\\]]", "", omics, perl = T)  # remove symbols, punctuation
omics <- tolower(omics)

m <- data.frame(a = omics, b = str_match(omics, "^(.*?omics)-")[, 2])  # matches e.g. "genomics-based"
omics <- ifelse(is.na(m$b), as.character(m$a), as.character(m$b))                       

m <- data.frame(a = omics, b = str_match(omics, "-{1,}(.*?omics)$")[, 2])  # matches e.g. "phospho-proteomics"
omics <- ifelse(is.na(m$b), as.character(m$a), as.character(m$b))

omics <- unlist(strsplit(omics, "\\/"))  # split e.g. "genomics/proteomics"
omics <- omics[grep("omics", omics)]     # just the -omics words again

# OK we're down to the edge cases now :)
omics <- gsub("applications", "", omics)
omics <- gsub("\\(meta\\)", "meta", omics)

4. Visualize
The top 20 -omics in 2013 and the less popular:

omics.freq <- as.data.frame(table(omics))
omics.freq <- omics.freq[ order(omics.freq$Freq, decreasing = T),]
ggplot(head(omics.freq, 20)) + geom_bar(aes(omics, Freq), stat = "identity", fill = "darkblue")
                             + coord_flip() + theme_bw()
# and the less popular
subset(omics.freq, Freq == 1)
On the right, the top 20. Click for a larger version of the graphic. Top of the list so far for 2013 is proteomics, followed by genomics and metabolomics.

Listed below, those -omics found only once in titles from 2013. Some shockers, I think you’ll agree (paging Jonathan Eisen).

                    omics Freq
          aquaphotomics    1
       biointeractomics    1
             calciomics    1
            cholanomics    1
           cytogenomics    1
           cytokinomics    1
          econogenomics    1
            glcnacomics    1
 glycosaminoglycanomics    1
          interactomics    1
               ionomics    1
         macroeconomics    1
            materiomics    1
      metalloproteomics    1
              metaomics    1
     metaproteogenomics    1
           microbiomics    1
         microeconomics    1
          microgenomics    1
        microproteomics    1
               miromics    1
         mitoproteomics    1
             mobilomics    1
             morphomics    1
              museomics    1
              neuromics    1
       neuropeptidomics    1
        nitroproteomics    1
      nutrimetabonomics    1
           oncogenomics    1
        orthoproteomics    1
            pangenomics    1
           petroleomics    1
   pharmacometabolomics    1
     pharmacoproteomics    1
   phylotranscriptomics    1
              phytomics    1
           postgenomics    1
              pyteomics    1
          radiogenomics    1
           rehabilomics    1
     retrophylogenomics    1
                 romics    1
            secretomics    1
              sensomics    1
         speleogenomics    1
           surfaceomics    1
              surfomics    1
     toxicometabolomics    1
            vaccinomics    1
              variomics    1
omics

Top 20 -omics in PubMed titles, 2013

Never heard of romics? That’s OK. It’s a surname.


Filed under: bioinformatics, publications, R, statistics Tagged: omics, pubmed

To leave a comment for the author, please follow the link and comment on their blog: What You're Doing Is Rather Desperate » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)