Good riddance to Excel pivot tables

[This article was first published on Recology, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Excel pivot tables have been how I have reorganized data…up until now. These are just a couple of examples why R is superior to Excel for reorganizing data:


################ Good riddance to pivot tables ############
library(reshape2)
library(plyr)
 
dataset <- data.frame(var1 = rep(c("a","b","c","d","e","f"), each = 4), 
 var2 = rep(c("level1","level1","level2","level2"), 6), 
 var3 = rep(c("h","m"), 12), meas = rep(1:12))



# simply pivot table
cast(dataset, var1 ~ var2 + var3)
Using meas as value column.  Use the value argument to cast to override this choice
  var1 level1_h level1_m level2_h level2_m
1    a        1        2        3        4
2    b        5        6        7        8
3    c        9       10       11       12
4    d        1        2        3        4
5    e        5        6        7        8
6    f        9       10       11       12
 
# mean by var1 and var2
cast(dataset, var1 ~ var2, mean)
Using meas as value column.  Use the value argument to cast to override this choice
  var1 level1 level2
1    a    1.5    3.5
2    b    5.5    7.5
3    c    9.5   11.5
4    d    1.5    3.5
5    e    5.5    7.5
6    f    9.5   11.5

 
# mean by var1 and var3
cast(dataset, var1 ~ var3, mean)
Using meas as value column.  Use the value argument to cast to override this choice
  var1  h  m
1    a  2  3
2    b  6  7
3    c 10 11
4    d  2  3
5    e  6  7
6    f 10 11
 
# mean by var1, var2 and var3 (version 1)
cast(dataset, var1 ~ var2 + var3, mean)
Using meas as value column.  Use the value argument to cast to override this choice
  var1 level1_h level1_m level2_h level2_m
1    a        1        2        3        4
2    b        5        6        7        8
3    c        9       10       11       12
4    d        1        2        3        4
5    e        5        6        7        8
6    f        9       10       11       12
 
# mean by var1, var2 and var3 (version 2)
cast(dataset, var1 + var2 ~ var3, mean)
Using meas as value column.  Use the value argument to cast to override this choice
   var1   var2  h  m
1     a level1  1  2
2     a level2  3  4
3     b level1  5  6
4     b level2  7  8
5     c level1  9 10
6     c level2 11 12
7     d level1  1  2
8     d level2  3  4
9     e level1  5  6
10    e level2  7  8
11    f level1  9 10
12    f level2 11 12

 
# use package plyr to create flexible data frames...
dataset_plyr <- ddply(dataset, .(var1, var2), summarise, 
 mean = mean(meas), 
 se = sd(meas),
 CV = sd(meas)/mean(meas)
)
> dataset_plyr
   var1   var2 mean        se         CV
1     a level1  1.5 0.7071068 0.47140452
2     a level2  3.5 0.7071068 0.20203051
3     b level1  5.5 0.7071068 0.12856487
4     b level2  7.5 0.7071068 0.09428090
5     c level1  9.5 0.7071068 0.07443229
6     c level2 11.5 0.7071068 0.06148755
7     d level1  1.5 0.7071068 0.47140452
8     d level2  3.5 0.7071068 0.20203051
9     e level1  5.5 0.7071068 0.12856487
10    e level2  7.5 0.7071068 0.09428090
11    f level1  9.5 0.7071068 0.07443229
12    f level2 11.5 0.7071068 0.06148755
 
# ...to use for plotting
qplot(var1, mean, colour = var2, size = CV, data = dataset_plyr, geom = "point")

To leave a comment for the author, please follow the link and comment on their blog: Recology.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)