Site icon R-bloggers

get UCSC images for a list of regions in batch

[This article was first published on One Tip Per Day, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Here is my working R code for the task. It can be simplified as 3 lines.


# example of controling individual track
#theURL=”http://genome.ucsc.< wbr>edu/cgi-bin/hgTracks?db=mm9&< wbr>wgRna=hide&cpgIslandExt=pack&< wbr>ensGene=hide&mrna=hide&< wbr>intronEst=hide&mgcGenes=hide&< wbr>hgt.psOutput=on&cons44way=< wbr>hide&snp130=hide&snpArray=< wbr>hide&wgEncodeReg=hide&pix=< wbr>1000&refGene=pack&knownGene=< wbr>hide&rmsk=hide
# example of controling via session
theURL=”http://genome-preview.< wbr>ucsc.edu/cgi-bin/hgTracks?hgS_< wbr>doOtherUser=submit&hgS_< wbr>otherUserName=sterding&hgS_< wbr>otherUserSessionName=mm9_< wbr>piRNA&hgt.psOutput=on&pix=1000< wbr>“

# read regions
toPlot=read.table(“../data/< wbr>piRNA.clusters.coordinates.< wbr>cpg.bed”, header=T)

## paralle version
library(multicore)
# mclapply(1:nrow(toPlot), function(i) screenshotUCSC(theURL, “”, as.character(toPlot$chr[i]), toPlot$start[i]-2000, toPlot$end[i]+1999, paste(“region_”, i, “_”, toPlot$name[i],”.pdf”, sep=””)), mc.cores=10)

# anti-robot version 
# UCSC Policy: Program-driven use of this software is limited to a maximum of one hit every 15 seconds and no more than 5,000 hits per day.
for(i in 1:nrow(toPlot)){
    screenshotUCSC(theURL, “”, as.character(toPlot$chr[i]), toPlot$start[i]-2000, toPlot$end[i]+1999, paste(“region_”, i, “_”, toPlot$name[i],”.pdf”, sep=””))
    Sys.sleep(5) 
}

# merge script
mergePDF(“piRNAs_ucsc_< wbr>screeshot.pdf”, list.files(pattern=”region_.*.< wbr>pdf”))
try(system(“rm region_.*.pdf”))

####### lib ############

# Here is an R script wrote by Aaron Statham which saves UCSC to pdfs –
# you can choose which genome and tracks to display by altering the ‘url’ parameter. ‘trackfile’ is the url of a file describing the custom tracks (beds/bigwigs) to display
mergePDF <- function(output=”merged.pdf”, sourcefiles=c(“source1.pdf”,”< wbr>source2.pdf”,”source3.pdf”))
{
    # create the command string and call the command using system()
    # merging command from http://hints.macworld.com/< wbr>article.php?story=< wbr>2003083122212228
    command=paste(“gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite”,paste(“-< wbr>sOutputFile”,output, sep=”=”), paste(sourcefiles, collapse=” “),sep=” “)
    try(system(command))
}


#  Reference: http://www.biostars.org/post/< wbr>show/6132/batch-viewing-of-< wbr>ucsc-browser-graphic/
screenshotUCSC <- function(url, trackfile, chr, start, end, filename) {
        oldpen <- options(“scipen”)
        options(scipen=100)
        temp <- readLines(paste(url, “&hgt.customText=”, trackfile, “&position=”,chr,”:”,start,”-“< wbr>,end, sep=””))
        #cat(temp,”\n”)
        pdfurl <- paste(“http://genome-preview.< wbr>ucsc.edu/trash“,gsub(“.*trash”< wbr>,””,gsub(“.pdf.*”,””,temp[< wbr>grep(“.pdf”, temp, fixed=TRUE)][1])), “.pdf”, sep=””)
        cat(pdfurl,”\n”);
        options(scipen=oldpen)
        download.file(pdfurl, filename, mode=”wb”, quiet=TRUE)
}

To leave a comment for the author, please follow the link and comment on their blog: One Tip Per Day.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.