Importing Illumina BeadArray data into R

[This article was first published on Getting Genetics Done, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A colleague needed some help getting Illumina BeadArray gene expression data loaded into R for data analysis with limma. Hopefully whoever ran your arrays can export the data as text files formatted as described in the code below. If so, you can import those text files directly using the beadarray package. This way you avoid getting bogged down with GenomeStudio, which requires a license (ugh) and only runs on Windows (ughhh). Here’s how I do it.

#######################################################################
#
# import-beadstudio.R
# Stephen Turner, December 2014
#
# Ask the core to export text file data with the info below.
# Assumes control probe file has for each sample:
# ProbeID, AVG_Signal, BEAD_STDERR, Avg_NBEADS, Detection Pval.
# Assumes sample probe file has for each sample:
# ProbeID, Symbol, AVG_Signal, BEAD_STDERR, Avg_NBEADS, Detection Pval
# And other annotation columns:
# SEARCH_KEY, ILMN_GENE, CHROMOSOME, DEFINITION, SYNONYMS
#######################################################################
# Load libraries
library(beadarray)
library(limma)
## Pick one.
library(illuminaHumanv4.db)
library(illuminaMousev2.db)
# Load data
## location of your probe and qc files
dataFile <- "data/SampleProbeProfile.txt"
qcFile <- "data/ControlProbeProfile.txt"
## create an expressionset
eset <- readBeadSummaryData(dataFile=dataFile, qcFile=qcFile,
ProbeID="PROBE_ID", controlID="ProbeID",
skip=0, qc.skip=0,
annoCols=c("SYMBOL", "DEFINITION", "SYNONYMS", "CHROMOSOME", "ILMN_GENE", "SEARCH_KEY"))
# Optional: Annotate the samples (example)
## Manually (bad)
pData(eset)$condition <- factor(rep(c("ctl", "trt"), each=3))
## Better / more reproducible to do this by importing a csv/table than doing it manually. You've been warned.
pData <- read.csv("data/metadata.csv", header=TRUE, row.names=1)
# Optional: I use Illumina's annotation. You can annotate the probes yourself if you want.
# See http://www.bioconductor.org/help/workflows/annotation/annotation/
# Optional: Remove probes that aren't annotated with a gene
annotated <- !is.na((fData(eset)$SYMBOL))
table(annotated)
eset <- eset[annotated,]
rm(annotated)
# Normalize, transform
eset <- normaliseIllumina(eset, method="quantile", transform= "log2")
# Some of limma's stuff downstream doesn't work with whatever kind
# of object that you get out of normaliseIllumina().
# I coerce it to an "ExpressionSet" and everything seems to work fine.
class(eset) <- "ExpressionSet"
# Analyze with limma
## Make a design matrix
## Make a contrast matrix
## analyze the normal way: lmFit(), contrasts.fit(), eBayes(), topTable()

To leave a comment for the author, please follow the link and comment on their blog: Getting Genetics Done.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)