Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The other day, I wanted to add track lines to a GFF file, so that I could view different features as separate custom tracks in a genome browser. The need to shuffle genome coordinates between different file formats seems to occur all the time when you deal with some kind of bioinformatic data. It’s usually just text files; one just has to keep track of whether the positions should start on 0 or 1 and whether the end should include the last base or not . . .
> head(gff)
seqname source feature start end score strand 1 5 protein_coding mRNA 169010747 169031776 . + 2 5 protein_coding protein 169015421 169021641 . + 3 5 protein_coding five_prime_UTR 169010747 169010893 . + 4 5 protein_coding five_prime_UTR 169015398 169015420 . + 5 5 protein_coding CDS 169015421 169015579 . + 6 5 protein_coding CDS 169018052 169018228 . + frame group 1 . ID=ENST00000504258;Name=CCDC99-005;Parent=ENSG00000040275 2 . ID=ENSP00000421249;Name=CCDC99-005;Parent=ENST00000504258 3 . Parent=ENST00000504258 4 . Parent=ENST00000504258 5 0 Name=CDS:CCDC99;Parent=ENST00000504258 6 0 Name=CDS:CCDC99;Parent=ENST00000504258
The above example consists of a few lines from the Ensembl human database, not the actual tracks I was interested in. Anyway, this is what I did: instead of using write.table() directly, explicitly open a file for writing, first write some track line, then write the relevant subset, and repeat.
tracks <- unique(gff$feature) connection <- file("separate_tracks.gff", "w") for (k in 1:length(tracks)) { writeLines(paste("track name=", tracks[k], sep=""), connection) write.table(subset(gff, feature==tracks[k]), sep="\t", row.names=F, col.names=F, quote=F, file=connection) } close(connection)
Postat i:med mera Tagged: gff, R
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.