Annotating limma Results with Gene Names for Affy Microarrays

[This article was first published on Getting Genetics Done, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Lately I’ve been using the limma package often for analyzing microarray data. When I read in Affy CEL files using ReadAffy(), the resulting ExpressionSet won’t contain any featureData annotation. Consequentially, when I run topTable to get a list of differentially expressed genes, there’s no annotation information other than the Affymetrix probeset IDs or transcript cluster IDs. There are other ways of annotating these results (INNER JOIN to a MySQL database, biomaRt, etc), but I would like to have the output from topTable already annotated with gene information. Ideally, I could annotate each probeset ID with a gene symbol, gene name, Ensembl ID, and have that Ensembl ID hyperlink out to the Ensembl genome browser. With some help from Gordon Smyth on the Bioconductor Mailing list, I found that annotating the ExpressionSet object results in the output from topTable also being annotated.

The results from topTable are pretty uninformative without annotation:
ID logFC AveExpr t P.Value adj.P.Val B
7902702 -6.8 7.8 -46 1.0e-09 3.3e-05 11.0
8117594 -4.3 10.0 -33 9.6e-09 1.5e-04 10.0
8168794 -3.6 6.5 -30 1.7e-08 1.9e-04 9.7
8103736 -3.4 7.7 -28 3.1e-08 2.5e-04 9.3
7897426 -4.0 7.2 -26 4.7e-08 3.1e-04 9.0
8094278 -3.7 7.0 -24 7.4e-08 3.7e-04 8.7
view raw noanno.txt hosted with ❤ by GitHub


After annotation:
ID Symbol Name Ensembl logFC AveExpr t P.Value adj.P.Val B
7902702 CLCA2 chloride channel accessory 2 <a href='http://useast.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000137975'>ENSG00000137975</a> -6.832901 7.793344 -46.35621 1.015175e-09 3.281148e-05 11.177075
8117594 HIST1H2BM histone cluster 1, H2bm <a href='http://useast.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000196374'>ENSG00000196374</a> -4.255756 10.096873 -33.21938 9.579122e-09 1.548034e-04 10.063255
8168794 CENPI centromere protein I <a href='http://useast.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000102384'>ENSG00000102384</a> -3.612035 6.486830 -30.41465 1.733246e-08 1.867342e-04 9.698515
8103736 SCRG1 stimulator of chondrogenesis 1 <a href='http://useast.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000164106'>ENSG00000164106</a> -3.433737 7.690753 -27.94797 3.058621e-08 2.471442e-04 9.321687
7897426 CA6 carbonic anhydrase VI <a href='http://useast.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000131686'>ENSG00000131686</a> -4.048937 7.173867 -26.18729 4.732396e-08 3.059115e-04 9.014277
8094278 NCAPG non-SMC condensin I complex, subunit G <a href='http://useast.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000109805'>ENSG00000109805</a> -3.658660 7.008561 -24.48718 7.419854e-08 3.743457e-04 8.681846
view raw afteranno.txt hosted with ❤ by GitHub


You can generate an HTML file with clickable links to the Ensembl Genome Browser for each gene:


Here’s the R code to do it:
# Install the necessary bioconductor packages. Use the appropriate annotation.db file
# for your organism/platform. See http://www.bioconductor.org/packages/release/data/annotation/
source("http://www.bioconductor.org/biocLite.R")
biocLite("affy")
biocLite("limma")
biocLite("annotate")
biocLite("hugene10sttranscriptcluster.db")
install.packages("R2HTML")
# Load required packages
library(affy)
library(limma)
library(annotate)
library(hugene10sttranscriptcluster.db)
library(R2HTML)
# RMA to convert Affybatch to ExpressionSet using robust multichip average
eset <- rma(myaffybatch)
class(eset)
# Which platform?
eset@annotation
# Which objects are in this annotation package?
ls("package:hugene10sttranscriptcluster.db")
# Get the transcript cluster IDs from the expressionset
ID <- featureNames(eset)
# Look up the Gene Symbol, name, and Ensembl Gene ID for each of those IDs
Symbol <- getSYMBOL(ID, "hugene10sttranscriptcluster.db")
Name <- as.character(lookUp(ID, "hugene10sttranscriptcluster.db", "GENENAME"))
Ensembl <- as.character(lookUp(ID, "hugene10sttranscriptcluster.db", "ENSEMBL"))
# Note: looking up Ensembl gene IDs by Affy probe IDs can sometimes result in multiple
# gene IDs being returned. This is why the default output from lookUp is a list.
# I'm coercing this list into a character vector so it's the same size as my list of IDs,
# But this is discarding information where one probe ID maps to multiple gene IDs.
# If you're aware of a better solution, please leave it in the comments! Thanks.
# Wherever you have a missing Ensembl ID, make hyperlink missing also.
# If you have an Ensembl ID, create a hyperlink that goes to the
# Ensembl genome browser. Change the URL if you're not using human.
Ensembl <- ifelse(Ensembl=="NA", NA,
paste("<a href='http://useast.ensembl.org/Homo_sapiens/Gene/Summary?g=",
Ensembl, "'>", Ensembl, "</a>", sep=""))
# Make a temporary data frame with all those identifiers
tmp <- data.frame(ID=ID, Symbol=Symbol, Name=Name, Ensembl=Ensembl, stringsAsFactors=F)
# The stringsAsFactors makes "NA" characters. This fixes that problem.
tmp[tmp=="NA"] <- NA
# set the featureData for your expressionset using the data frame you created above.
fData(eset) <- tmp
# Clean up
rm(ID, Symbol, Name, Ensembl, tmp)
# Fit the model
fit <- lmFit(eset,design)
fit <- eBayes(fit)
tt <- topTable(fit, coef="tissue")
# Write out an HTML file with clickable links to the Ensembl Genome Browser.
HTML(tt, "out.html", append=F)
view raw annotatelimma.r hosted with ❤ by GitHub

To leave a comment for the author, please follow the link and comment on their blog: Getting Genetics Done.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)