Extending methylKit : Extract promoters with differentially methylated CpGs

[This article was first published on CHITKA, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In my previous post, I wrote about the features of methylKit. Here, I will discuss how to extend bisulfite sequencing data analysis beyond methylKit.

Annotation is an important feature of genomic analyses. Coming to bisulfite sequencing analyses such as RRBS or WGBS, methylKit does a pretty good job by identifying the differentially methylated individual CpGs or any specific regions/tiles. It also performs basic annotation and plots pie charts indicating where all the differentially methylated CpGs overlap the genic annotations.

The adjacent picture is an example of the kind of annotation performed by methylKit.Using the native functions, methylKit users will be able to annotate the differentially methylated CpGs to the genic regions. Adjacent picture indicates that among all the diffmeth
CpGs 46% overlap with promoters, 27% overlap with intergenic region. Another way to look at the annotations is to identify the list of promoters, exons, introns, intergenic regions that overlap differentially methylated CpGs. There are no methylKit functions to perform the annotations from the point of genic regions.However, methylKit facilitates this by enabling the coercion of the methylKit objects into “GRanges” (GenomicRanges). The following script will help methylKit users in extracting the list of promoter/exons/introns that overlap with differentially methylated CpGs.
# This script illustrates an extension methylKit features. This illustrates how to extract a list of genic features
# that overlap with the differentially methylated CpGs
# Created on 23rd August 2014
# This script assumes all the files are in the working directory
# This script assumes that annotation object has been created using methylKit previously using read.transcript.features()
# Assumes that methylDiff object has been created.
# methylKit vignette explains how each of the above objects are created.
library(methylKit)
library(GenomicRanges)
library(rtracklayer)
# test.diff is a methylDiff object obtained from methylKit. Notice the strand as '+'
test.diff
# gene.obj is the annotation object of GRangesList class
gene.obj
# By default strand information is '+'. In the CpG context, methylation is reciprocal on both strands.
# So, convert the '+' strand into both strands '*'
test.diff$strand="*"
# notice the change in the strand info
test.diff
# coerce test.diff into GRanges object
test_gr=as(test.diff, "GRanges")
# The following script is shown for promoters. This can be used to extract the list of exons, introns or TSS
# Get the list of promoters overlapping diff meth CpGs
diff_promo= subsetByOverlaps(gene.obj$promoters, test_gr)
diff_promo
# Get the list of differentially methylated CpGs overlapping the promoters
diff_promo_cpgs= subsetByOverlaps(test_gr, gene.obj$promoters)
diff_promo_cpgs
# export the extracted list as BED file
export.bed(diff_promo,"diff_promo.bed")
export.bed(diff_promo_cpgs,"diff_promo_cpgs.bed")

To leave a comment for the author, please follow the link and comment on their blog: CHITKA.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)