A brief note: R 3.0.0 and bioinformatics
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Today marks the release of R 3.0.0. There will be plenty of commentary and useful information at sites such as R-bloggers (for example, Tal’s post).
Version 3.0.0 is great news for bioinformaticians, due to the introduction of long vectors. What does that mean? Well, several months ago, I was using the simpleaffy package from Bioconductor to normalize Affymetrix exon microarrays. I began as usual by reading the CEL files:
f <- list.files(path = "data/affyexon", pattern = ".CEL.gz", full.names = T, recursive = T) cel <- ReadAffy(filenames = f)
When this happened:
Error in read.affybatch(filenames = l$filenames, phenoData = l$phenoData, : allocMatrix: too many elements specified
I had a relatively-large number of samples (337), but figured a 64-bit machine with ~ 100 GB RAM should be able to cope. I was wrong: due to a hard-coded limit to vector length in R, my matrix had become too large regardless of available memory. See this post and this StackOverflow question for the computational details.
My solution at the time was to resort to Affymetrix Power Tools. Hopefully, the introduction of the LONG vector will make Bioconductor even more capable and useful.
Filed under: bioinformatics, programming, R, statistics Tagged: 3.0.0, affymetrix, bioconductor, microarray
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.