Retrieving transcriptome sequences for RNASeq analysis

11 years ago

[This article was first published on Recipes, scripts and genomics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

One approach for analyzing RNASeq data from an organism with a well-annotated genome, is to align the reads to mRNA (cDNA) sequences instead of the genome. To do that you need to extract the transcript sequences from a database. This is how to extract ensembl transcript sequences from UCSC from within R:

_________________________________________________

library(GenomicFeatures)

library(BSgenome.Hsapiens.UCSC.hg18)

tr < - makeTranscriptDbFromUCSC(genome="hg18", tablename="ensGene")

tr_seq < - extractTranscriptsFromGenome(Hsapiens, tr)

write.XStringSet(tr_seq, file=”hg18.ensgene.transcripts.fasta”, ‘fasta’, width=80, append=F)

_________________________________________________

Next steps can be to build a reference index for bowtie, perform the alignment, and count the number of reads aligned in R using table(). Differential expression analysis may be performed by DESeq.

To leave a comment for the author, please follow the link and comment on their blog: Recipes, scripts and genomics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.