CpG Island shelves

[This article was first published on Chitka, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Nowadays, regions with relatively lower CpG density are gaining importance in DNA methylation studies. This is based on the fact that a majority of CpG rich regions (CpG islands) are non-dynamic and less variant in terms of methylation status probed across a variety of tissues and cell populations (Irizarry 2009, Ziller 2013). It is now proven that methylation is more dynamic along the CpG shores (< 2kb flanking CpG Islands) and CpG shelves (<2kb flanking outwards from a CpG shore). While it is easy to retreive the genomic co-ordinates of CpG Islands from UCSC browser, public resources for retrieving the genomic co-ordinates for shores and shelfs were missing or scarce. Here, I am displaying the code (in R using Bioconductor package GenomicRanges) I use for generating objects for CpG islands, CpG island shores and CpG island shelves. The resulting objects are GRanges objects which could be used in multiple downstream applications in Bioconductor related packages.

# This script will generate the cpg_islands, shores and shelves as a granges list.
# updated on 6th june 2014
# eqalizes seqinfo of the cpg objects to hg19 objects (removes minor chromosomes, X, Y)
# Load library, hg19 object, read cpg_island bed file
library(rtracklayer)
load("/home/kalyankpy/hg19/hg19_gr.rda")
source("/home/kalyankpy/Dropbox/myscripts/r_scripts/df2gr_with_seqinfo.r")
cpg_islands=read.table("/home/kalyankpy/reg_elements/cpg_island.bed")
# name the columns
names(cpg_islands)<-c("seqnames","start","end","name")
#replace the column labelled name
cpg_islands$name<-"island"
cpg_islands$start=cpg_islands$start+1 # convert 0 based start in bedfile into 1 based start
# equalize the seqinfo
cpg_islands<-df2gr(cpg_islands,hg19_gr)
###############################################################
# Extract CpG island shores
###############################################################
# extract the shore defined by 2000 bp upstream of cpg islands
shore1=flank(cpg_islands, 2000)
# extract the shore defined by 2000 bp downstream of cpg islands
shore2=flank(cpg_islands,2000,FALSE)
# perform intersection and combine the shores where they overlap
shore1_2=reduce(c(shore1,shore2))
# extract the features (ranges) that are present in shores only and not in cpg_islands (ie., shores not overlapping islands)
cpgi_shores=setdiff(shore1_2, cpg_islands)
cpgi_shores$name="shore"
###############################################################
# Extract CpG island shelves
###############################################################
# extract the shore defined by 4000 bp upstream of cpg islands
shelves1=flank(cpg_islands, 4000)
# extract the shore defined by 2000 bp downstream of cpg islands
shelves2=flank(cpg_islands,4000,FALSE)
# perform intersection and combine the shelves where they overlap
shelves1_2=reduce(c(shelves1,shelves2))
# create a set of ranges consisting CpG Islands, Shores
island_shores=c(cpg_islands, cpgi_shores)
# extract the features (ranges) that are present in shelves only and not in cpg_islands or shores(ie., shelves not overlapping islands or shores)
cpgi_shelves=setdiff(shelves1_2, island_shores)
cpgi_shelves$name="shelf"
#################################################################
# Export the GRanges files into bed for visualization
#################################################################
cpg_elements_gr=GRangesList("islands"=cpg_islands, "shores"=cpgi_shores, "shelves"=cpgi_shelves)
save(cpg_elements_gr, file="/home/kalyankpy/reg_elements/cpg_element_gr.rda")

To leave a comment for the author, please follow the link and comment on their blog: Chitka.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)