Importing Illumina BeadArray data into R
[This article was first published on Getting Genetics Done, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A colleague needed some help getting Illumina BeadArray gene expression data loaded into R for data analysis with limma. Hopefully whoever ran your arrays can export the data as text files formatted as described in the code below. If so, you can import those text files directly using the beadarray package. This way you avoid getting bogged down with GenomeStudio, which requires a license (ugh) and only runs on Windows (ughhh). Here’s how I do it.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
####################################################################### | |
# | |
# import-beadstudio.R | |
# Stephen Turner, December 2014 | |
# | |
# Ask the core to export text file data with the info below. | |
# Assumes control probe file has for each sample: | |
# ProbeID, AVG_Signal, BEAD_STDERR, Avg_NBEADS, Detection Pval. | |
# Assumes sample probe file has for each sample: | |
# ProbeID, Symbol, AVG_Signal, BEAD_STDERR, Avg_NBEADS, Detection Pval | |
# And other annotation columns: | |
# SEARCH_KEY, ILMN_GENE, CHROMOSOME, DEFINITION, SYNONYMS | |
####################################################################### | |
# Load libraries | |
library(beadarray) | |
library(limma) | |
## Pick one. | |
library(illuminaHumanv4.db) | |
library(illuminaMousev2.db) | |
# Load data | |
## location of your probe and qc files | |
dataFile <- "data/SampleProbeProfile.txt" | |
qcFile <- "data/ControlProbeProfile.txt" | |
## create an expressionset | |
eset <- readBeadSummaryData(dataFile=dataFile, qcFile=qcFile, | |
ProbeID="PROBE_ID", controlID="ProbeID", | |
skip=0, qc.skip=0, | |
annoCols=c("SYMBOL", "DEFINITION", "SYNONYMS", "CHROMOSOME", "ILMN_GENE", "SEARCH_KEY")) | |
# Optional: Annotate the samples (example) | |
## Manually (bad) | |
pData(eset)$condition <- factor(rep(c("ctl", "trt"), each=3)) | |
## Better / more reproducible to do this by importing a csv/table than doing it manually. You've been warned. | |
pData <- read.csv("data/metadata.csv", header=TRUE, row.names=1) | |
# Optional: I use Illumina's annotation. You can annotate the probes yourself if you want. | |
# See http://www.bioconductor.org/help/workflows/annotation/annotation/ | |
# Optional: Remove probes that aren't annotated with a gene | |
annotated <- !is.na((fData(eset)$SYMBOL)) | |
table(annotated) | |
eset <- eset[annotated,] | |
rm(annotated) | |
# Normalize, transform | |
eset <- normaliseIllumina(eset, method="quantile", transform= "log2") | |
# Some of limma's stuff downstream doesn't work with whatever kind | |
# of object that you get out of normaliseIllumina(). | |
# I coerce it to an "ExpressionSet" and everything seems to work fine. | |
class(eset) <- "ExpressionSet" | |
# Analyze with limma | |
## Make a design matrix | |
## Make a contrast matrix | |
## analyze the normal way: lmFit(), contrasts.fit(), eBayes(), topTable() | |
To leave a comment for the author, please follow the link and comment on their blog: Getting Genetics Done.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.