A phylogenetic ordination in one line: magrittr Challenge!

[This article was first published on rstats - MetaEvoPhyloEcoOmics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A picture suitable for a fancy pipe enthusiast

My good friend Andrew recently posted this gist:

Apparently, there is a simple problem called Fizz buzz which is sometimes used to identify competent programmers. A good opportunity to practice some dplyr and magrittr tricks.

library(dplyr)
library(magrittr)
library(knitr)

1:100 %>%
  data.frame %>%
  set_names("num") %>%
  tbl_df %>%
  mutate(fizz=num %>% mod(3) %>% equals(0) %>% ifelse("fizz",num),
         buzz=num %>% mod(5) %>% equals(0) %>% ifelse("buzz",fizz),
         fizzbuzz= num %>% mod(15) %>% equals(0) %>% ifelse("fizzbuzz",buzz)
         ) %>%
  select(num,fizzbuzz) %>%
  kable("markdown")
num fizzbuzz
1 1
2 2
3 fizz
4 4
5 buzz
6 fizz
7 7
8 8
9 fizz
10 buzz
11 11
12 fizz
13 13
14 14
15 fizzbuzz
16 16
17 17
18 fizz
19 19
20 buzz
21 fizz
22 22
23 23
24 fizz
25 buzz
26 26
27 fizz
28 28
29 29
30 fizzbuzz
31 31
32 32
33 fizz
34 34
35 buzz
36 fizz
37 37
38 38
39 fizz
40 buzz
41 41
42 fizz
43 43
44 44
45 fizzbuzz
46 46
47 47
48 fizz
49 49
50 buzz
51 fizz
52 52
53 53
54 fizz
55 buzz
56 56
57 fizz
58 58
59 59
60 fizzbuzz
61 61
62 62
63 fizz
64 64
65 buzz
66 fizz
67 67
68 68
69 fizz
70 buzz
71 71
72 fizz
73 73
74 74
75 fizzbuzz
76 76
77 77
78 fizz
79 79
80 buzz
81 fizz
82 82
83 83
84 fizz
85 buzz
86 86
87 fizz
88 88
89 89
90 fizzbuzz
91 91
92 92
93 fizz
94 94
95 buzz
96 fizz
97 97
98 98
99 fizz
100 buzz
view raw fizzbuzz.md hosted with ❤ by GitHub

It shows how to solve a simple programming exercise know as FizzBuzz in R, but also is clearly showing off his already amazing proficiency in the very new R package magrittr. Those are some piping hot coding skillz (or should I say (not-a-)piping hot?)

Clearly this was a challenge to see who can come up with the most ridiculously long and complicated expression that uses only a single string of functions piped together with the magic of %>%. In that spirit, I have decided to try and replicate one of the more complicated analyses and figures in my first paper—a phylogenetic ordination—using no intermediate variables and lots of %>%-ing. Actually, I am going to use a different method for the phylogenetic ordination from the paper, and I think this one is better (I will post more on this later).

I download the data directly from the supplemental material on my paper, so you should be able to run this code on your computer if you want (as long as you have an internet connection). This requires the following packages:

  • magrittr
  • ape
  • plyr
  • dplyr
  • vegan
  • ggplot2
library(ape)
library(plyr)
library(dplyr)
library(magrittr)
library(vegan)
library(ggplot2)
dat<-scan("http://www.plosone.org/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pone.0007071.s002",what=character(0)) %>% #download data
  read.tree(text=.) %>% #make it a phylo object
  cophenetic %>% # turn it into a phylogenetic distance matrix
  l(.,x -> cmdscale(x,nrow(x)-1)) %>% # get phylogenetic principle components
  l(.,x -> x[order(rownames(x)), ]) %>% # order by rownames
  aaply("http://www.plosone.org/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pone.0007071.s001" %>% 
                          read.csv %>% # read-in community data
                          l(x -> x %>% set_rownames(paste(rownames(x),x[,2],sep="_"))) %>% #save treatment names for later in the rownames
                          extract(c(-1,-2)) %>% # toss non-species data
                          l(x -> x[,order(colnames(x))]) %>% # order by column names
                          as.matrix, # make it a matrix
        1,function(x, y) (y*x) %>% colSums, y=.) %>% #make a function to pipe in 
  # community data as x, and phylo data (.) as y, then multiple each community abundance
  # by phylo principle components. now we have a phylogenetic feature vector for each
  # site!
  metaMDS("euclidean") %>% # use non-metric multidimensional scaling on phylogenetic features
  extract("points") %>% # pull out the ordinated points
  data.frame %>% # make it  data.frame
  l(., x -> mutate(x,treatment=rownames(x) %>%
                  strsplit("_") %>% 
                    laply(function(y) y[2]))) %>% #extract treatment names from rownames
  ggplot(.,aes(x=points.MDS1,y=points.MDS2)) + geom_point(aes(color=factor(treatment)),size=5) #make a ggplot

## Run 0 stress 0.1055 
## Run 1 stress 0.1055 
## ... procrustes: rmse 0.01079  max resid 0.05315 
## Run 2 stress 0.1199 
## Run 3 stress 0.1198 
## Run 4 stress 0.1057 
## ... procrustes: rmse 0.006186  max resid 0.03488 
## Run 5 stress 0.1055 
## ... procrustes: rmse 0.01088  max resid 0.05316 
## Run 6 stress 0.121 
## Run 7 stress 0.1408 
## Run 8 stress 0.1055 
## ... procrustes: rmse 0.01081  max resid 0.05317 
## Run 9 stress 0.1055 
## ... New best solution
## ... procrustes: rmse 0.000166  max resid 0.0007816 
## *** Solution reached

dat #plot ordination!!

A phylogenetic ordination for disturbed and undisturbed sites

You can also find the above as a gist.

We can see that the main difference between the disturbed and undisturbed sites is that there is a much larger variance in phylogenetic composition between disturbed sites. This has interesting implications which has inspired me to do a follow-up to my old paper, on
this blog. Look for that soon.

So that’s a phylogenetic ordination, done in a single line of code (well, it would be a single line if I removed all the linebreaks). Not a single intermediate variable was used. Note that if you do try this code, you will get some warnings, they don’t affect the outcome. It’s not nearly as elegant as Andrew’s, but I would hate to see what this would look like if I tried nesting all of these functions!

Did I mention that I love magrittr (you could probably guess that from the gif that I made, below).

Happy piping!

A gif a giant nerd made

To leave a comment for the author, please follow the link and comment on their blog: rstats - MetaEvoPhyloEcoOmics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)