ClusterProfiles

R on Guangchuang Yu

11 years ago

[This article was first published on YGC » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

It is very common to cluster genes based on their expression profiles, and also very common to integrate Gene Ontology to observe the distribution of biological processes, molecular functions and cellular components for a given gene list. But, what if the two in combination? The Gene Ontology distributions across a variety of gene clusters may give us a new insight to find out which specific GO terms may related to our biological problem.

I was inspired by the MCP paper which was also mentioned in my previous post, and developed an R function to get this job done.

Here comes the codes:

^?View Code RSPLUS

ClusterProfiles <- function(geneClusters, onto="CC", level=3, orgPackage="org.Hs.eg.db") {
	require(goProfiles)
	require(plyr)
	require(ggplot2)
	clusterProfile <- llply(geneClusters, as.data.frame(basicProfile), onto=onto, level=level, orgPackage = orgPackage)
	clusterProfile.df <- ldply(clusterProfile, rbind)
	colnames(clusterProfile.df) <- c("Cluster", "Description", "GOID", "Frequency")
	clusterProfile.df <- clusterProfile.df[clusterProfile.df$Frequency !=0,]
	clusterProfile.df$Description <- as.character(clusterProfile.df$Description) ## un-factor
	clusterProfile.df <- ddply(clusterProfile.df, .(Description), transform, Percent = Frequency/sum(Frequency), Total = sum(Frequency))
 
	x <- mdply(clusterProfile.df[, c("Description", "Total")], paste, sep=" (")
	y <- sapply(x[,3], paste, ")", sep="")
	clusterProfile.df$Description <- y		### label GO Description with gene counts.
	clusterProfile.df <-  clusterProfile.df[, -6] ###drop the *Total* column##
	mtitle <- paste(onto, "Ontology Distribution", sep = " ") 
	p <- ggplot(clusterProfile.df, aes(x = Cluster, y = Description, size = Percent))
        p <- p + geom_point(colour="steelblue") + opts(title = mtitle) + xlab("") + ylab("")
	p <- p + opts(axis.text.x = theme_text(colour="black", size="11", vjust = 1)) 
        p <- p + opts(axis.text.y = theme_text(colour="black", size="11", hjust = 1))
	result <- list(data=clusterProfile.df, p=p)
	return(result)
}

The input *geneClusters* is a list of clusters which contain gene IDs.
Other parameters can refer to the reference manual of Bioconductor package goProfiles.

I post an example below to illustrate how to use it.

> names(geneClusters)
[1] "A" "B" "C" "D"
> geneClusters[1]
$A
 [1]  3838 29766 51070   483 56667  5573  8971 10755   389  8531 55905  3024  7169  1595   387

> clust.go.prof = ClusterProfiles(geneClusters)
> head(clust.go.prof$data)
  Cluster             Description       GOID Frequency   Percent
1       B apical part of cell (4) GO:0045177         1 0.2500000
2       C apical part of cell (4) GO:0045177         1 0.2500000
3       D apical part of cell (4) GO:0045177         2 0.5000000
4       C           cell body (3) GO:0044297         2 0.6666667
5       D           cell body (3) GO:0044297         1 0.3333333
6       C  cell division site (1) GO:0032153         1 1.0000000
> clust.go.prof$p

The function return a list which contain the cluster profiles annotated with GO which may be useful for further analysis, and a graph which can plot the GO distributions as shown below.

The dot sizes was based on the percentage of frequency in each row.

I plan to develop the version 2 of this function to let user specify which GO categories they are interested in looking at.

Any comments or suggestions are welcomed.

p.s: My thanks goes to Tal Galili for inviting me to joint R-bloggers. I am very happy to see my post appeared in R-bloggers.

October 8, 2010 — dotplot (0)
May 28, 2010 — GOSemSim: an R package for measuring semantic similarity among GO terms and gene products (1)
August 11, 2008 — GOSemSim (0)
October 15, 2010 — The S3 OOP system (0)
October 11, 2010 — highlight R syntax in wordpress using wp-codebox (0)
September 25, 2010 — unorder factor in R (0)
October 20, 2007 — 使用R/BioC画网络图 (0)
September 14, 2007 — 在Ubuntu中安装R/BioConductor (0)
September 18, 2010 — R中概率分布的几个函数 (0)
September 15, 2010 — A new method for measuring functional similarity of microRNAs (2)

To leave a comment for the author, please follow the link and comment on their blog: YGC » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Related Posts