Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This week in our blog we started a list of great R code (www.r-project.org) snippets: http://cloudnumbers.com/what-is-your-favorite-r-feature
We are going to extend this list with several more nice R features. Please feel free to add comments with your favorite R code snippets.
Descriptive statistics:
A huge set of tools to describe and explore data is available in R. The default data set „attenu“ gives the peak accelerations measured at various observation stations for 23 earthquakes in California. For example, try the command summary() which (in this case) gives you a very nice descriptive overview for all observations:
attenu dim(attenu) attenu[1:10,] summary(attenu, digits = 4) pairs(attenu, main = "attenu data")
This code example loads the data set ‘attenu’, and prints the dimension and the first 10 rows of the dataset. Finally, it present a summary table and a matrix scatterplot for all observations.
R programming:
There are many ways to program in R. For implementing your first own R function please check the R manual “An Introduction to R, chapter 10“.
For example, this code creates your first function which summarizes two numbers:
addfunc <- function(x,y){ z <- x+y; return(z) } addfunc(3,2)
Or create simple for loops :
for(i in 1:5) print(1:i) for(n in c(2,5,10,20,50)) { x <- stats::rnorm(n) cat(n,":", sum(x^2),"\n") }
A bioinformatics example: Normalization of Micorarray Data
Using the Bioconductor repository (http://www.bioconductor.org/) there are many packages for the analyses of genomic data available. The affydata package is a simple data package. It provides an example dataset drawn from an actual Dilution experiment done by Gene Logic (http://www.genelogic.com/support/scientific-studies/). A standard pre-analyses step for mircorray data is the normalization process to remove production errors.
library(affydata) data(Dilution) Dilution phenoData(Dilution) pData(Dilution) # first plot boxplot(Dilution,col=c(2,2,3,3)) ##pick only a few genes to reduce calculation time gn <- sample(geneNames(Dilution),100) pms <- pm(Dilution[,3:4], gn) mva.pairs(pms) #normalization normalized.Dilution <- Biobase::combine(normalize(Dilution[, 1:2]), normalize(Dilution[, 3:4])) normalize.methods(Dilution) #second plot boxplot(normalized.Dilution, col=c(2,2,3,3), main="Normalized Arrays") pms <- pm(normalized.Dilution[, 3:4],gn) mva.pairs(pms)
Compare the plots before and after normalization!
For more details see the the affydata documentation (http://www.bioconductor.org/packages/2.8/data/experiment/html/affydata.html)
We will come up with more nice R features especially for high-performance computing with R in the next blog posts. Please feel free to add comments with your favorite R code snippets.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.