[This article was first published on rstats - MetaEvoPhyloEcoOmics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Apparently, there is a simple problem called Fizz buzz which is sometimes used to identify competent programmers. A good opportunity to practice some dplyr and magrittr tricks.
It shows how to solve a simple programming exercise know as FizzBuzz in R, but also is clearly showing off his already amazing proficiency in the very new R package magrittr. Those are some piping hot coding skillz (or should I say (not-a-)piping hot?)
Clearly this was a challenge to see who can come up with the most ridiculously long and complicated expression that uses only a single string of functions piped together with the magic of %>%. In that spirit, I have decided to try and replicate one of the more complicated analyses and figures in my first paper—a phylogenetic ordination—using no intermediate variables and lots of %>%-ing. Actually, I am going to use a different method for the phylogenetic ordination from the paper, and I think this one is better (I will post more on this later).
I download the data directly from the supplemental material on my paper, so you should be able to run this code on your computer if you want (as long as you have an internet connection). This requires the following packages:
magrittr
ape
plyr
dplyr
vegan
ggplot2
library(ape)
library(plyr)
library(dplyr)
library(magrittr)
library(vegan)
library(ggplot2)
dat<-scan("http://www.plosone.org/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pone.0007071.s002",what=character(0)) %>% #download data
read.tree(text=.) %>% #make it a phylo object
cophenetic %>% # turn it into a phylogenetic distance matrix
l(.,x -> cmdscale(x,nrow(x)-1)) %>% # get phylogenetic principle components
l(.,x -> x[order(rownames(x)), ]) %>% # order by rownames
aaply("http://www.plosone.org/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pone.0007071.s001" %>%
read.csv %>% # read-in community data
l(x -> x %>% set_rownames(paste(rownames(x),x[,2],sep="_"))) %>% #save treatment names for later in the rownames
extract(c(-1,-2)) %>% # toss non-species data
l(x -> x[,order(colnames(x))]) %>% # order by column names
as.matrix, # make it a matrix
1,function(x, y) (y*x) %>% colSums, y=.) %>% #make a function to pipe in
# community data as x, and phylo data (.) as y, then multiple each community abundance
# by phylo principle components. now we have a phylogenetic feature vector for each
# site!
metaMDS("euclidean") %>% # use non-metric multidimensional scaling on phylogenetic features
extract("points") %>% # pull out the ordinated points
data.frame %>% # make it data.frame
l(., x -> mutate(x,treatment=rownames(x) %>%
strsplit("_") %>%
laply(function(y) y[2]))) %>% #extract treatment names from rownames
ggplot(.,aes(x=points.MDS1,y=points.MDS2)) + geom_point(aes(color=factor(treatment)),size=5) #make a ggplot
## Run 0 stress 0.1055
## Run 1 stress 0.1055
## ... procrustes: rmse 0.01079 max resid 0.05315
## Run 2 stress 0.1199
## Run 3 stress 0.1198
## Run 4 stress 0.1057
## ... procrustes: rmse 0.006186 max resid 0.03488
## Run 5 stress 0.1055
## ... procrustes: rmse 0.01088 max resid 0.05316
## Run 6 stress 0.121
## Run 7 stress 0.1408
## Run 8 stress 0.1055
## ... procrustes: rmse 0.01081 max resid 0.05317
## Run 9 stress 0.1055
## ... New best solution
## ... procrustes: rmse 0.000166 max resid 0.0007816
## *** Solution reached
dat #plot ordination!!
We can see that the main difference between the disturbed and undisturbed sites is that there is a much larger variance in phylogenetic composition between disturbed sites. This has interesting implications which has inspired me to do a follow-up to my old paper, on
this blog. Look for that soon.
So that’s a phylogenetic ordination, done in a single line of code (well, it would be a single line if I removed all the linebreaks). Not a single intermediate variable was used. Note that if you do try this code, you will get some warnings, they don’t affect the outcome. It’s not nearly as elegant as Andrew’s, but I would hate to see what this would look like if I tried nesting all of these functions!
Did I mention that I love magrittr (you could probably guess that from the gif that I made, below).
This is a guest post by Stefan Milton, the author of the magrittr package which introduces the %>% operator to R programming. Preface (by Tal Galili) I was first introduced to the %>% (a.k.a: pipe) operator in R, thanks to Hadley Wickham’s (fascinating) dplyr tutorial (link to the workshop’s material) at useR!2014. After several discussions during…
F# made pipes popular among data scientists, and the magrittr package brought pipes to R. For example, with magrittr, you could write: [crayon-5532c01154e7a698823689/] instead of [crayon-5532c01154e80418780039/] For example: [crayon-5532c01154e85653101897/] becomes: [crayon-5532c01154e89139820003/] or even (to take it to the extreme): [crayon-5532c01154e8d865474926/] This doesn’t look like a revolution, but in practice…
Let’s consider piping in R both using the magrittr package and using the wrapr package. magrittr pipelines The magittr pipe glyph “%>%” is the most popular piping symbol in R. magrittr documentation describes %>% as follow. Basic piping: x %>% f is equivalent to f(x) x %>% f(y) is equivalent…