Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Note that this is largely a repeat of a previous post (except that I
have added a few plots at the bottom) as I am experimenting with being
able to write posts here directly from R using the knit2wp() function
(in the knitR package) and the R markdown language.If successful this will allow my posts with R code to show the results
produced, which will make the posts more readable. [Currently I don’t
like that the results look like the source code, but I have not
figured out how to reliably fix that yet.]I apologize for cluttering your mailboxes. Let me know if you have
any comments or suggestions.
I came across a “problem” today where I needed to create catch data for individual nets from length measurements made on individual fish in those nets. In other words, I had data that showed three individual length measurements for Brook Trout, two measurements for Lake Trout, and two measurements for Rainbow Trout in net #1 and I needed a data frame that showed these catch amounts (i.e., the three, two, and two). Of course, the real problem had more fish and more nets.
The ddply()
function from the plyr package works very well for this type of problem as illustrated below. Basically, this function is used to break down your original data frame into smaller groups (in this case nets), apply some function to each group (in this case compute the length of the fish length variable which will correspond to the number of fish caught), and then combine the results from each grouping back to a resultant data frame. Hadley Wickham, the author of plyr, calls this the Split-Apply-Combine strategy.
First, let’s make some toy data for the example
lens <- data.frame(net=rep(c(1,2,3),c(7,5,6)), eff=rep(c(1,2,2),c(7,5,6)), temp=rep(c(17,15.5,16.5),c(7,5,6)), species=c(rep(c("BKT","LKT","RBT"),c(3,2,2)), rep(c("BKT","LKT"),c(2,3)), rep(c("BKT","RBT"),c(4,2))), tl=round(rnorm(18,mean=100,sd=10),0) ) lens ## net eff temp species tl ## 1 1 1 17.0 BKT 111 ## 2 1 1 17.0 BKT 108 ## 3 1 1 17.0 BKT 106 ## 4 1 1 17.0 LKT 107 ## 5 1 1 17.0 LKT 103 ## 6 1 1 17.0 RBT 105 ## 7 1 1 17.0 RBT 96 ## 8 2 2 15.5 BKT 96 ## 9 2 2 15.5 BKT 94 ## 10 2 2 15.5 LKT 94 ## 11 2 2 15.5 LKT 119 ## 12 2 2 15.5 LKT 97 ## 13 3 2 16.5 BKT 108 ## 14 3 2 16.5 BKT 91 ## 15 3 2 16.5 BKT 111 ## 16 3 2 16.5 BKT 109 ## 17 3 2 16.5 RBT 94 ## 18 3 2 16.5 RBT 102
Then let’s use ddply()
to turn this into catch data. In this case, ddply()
takes the original data frame as the first argument, a formula that consists of the variables used to make the groupings (more about this below) as the second argument, the summarize()
function (without the parentheses) as the third argument, and then the name of a new variable set equal to a function that computes a summary (length()
of the fish length variable in this case). In this case, the original data frame will be split into groups based on unique combinations of the net and species variables (note that the eff(ort) and temp(erature) variables are not unique from the net variable so they will be repeated with net in the final data frame).
library(plyr) catch1 <- ddply(lens,~net+eff+temp+species, summarize,catch=length(tl)) catch1 ## net eff temp species catch ## 1 1 1 17.0 BKT 3 ## 2 1 1 17.0 LKT 2 ## 3 1 1 17.0 RBT 2 ## 4 2 2 15.5 BKT 2 ## 5 2 2 15.5 LKT 3 ## 6 3 2 16.5 BKT 4 ## 7 3 2 16.5 RBT 2
A common problem with this type of data is that mean catch per net will not be computed properly because some species were not captured in some nets, but no zero for those species is entered for those nets. The addZeroCatch()
function in the FSA package can be used to automatically (though, not quickly) enter these zeroes. This function requires the data frame with catches as the first argument, the name of the variable that identifies the net as the second argument, the name of the variable that identifies the species as the third argument, and a vector of names of variables that should be set to zero in the zerovar= argument. This process is illustrated below.
library(FSA) catch2 <- addZeroCatch(catch1,"net","species", zerovar="catch") catch2[order(catch2$net,catch2$species),] ## net eff temp species catch ## 1 1 1 17.0 BKT 3 ## 2 1 1 17.0 LKT 2 ## 3 1 1 17.0 RBT 2 ## 4 2 2 15.5 BKT 2 ## 5 2 2 15.5 LKT 3 ## 41 2 2 15.5 RBT 0 ## 6 3 2 16.5 BKT 4 ## 61 3 2 16.5 LKT 0 ## 7 3 2 16.5 RBT 2
Now, for example, the mean and SD of catch-per-unit-effort (CPE) per species can be computed.
catch2$cpe <- catch2$catch/catch2$eff ( cpesum <- ddply(catch2,~species, summarize,mean.cpe=mean(cpe),sd.cpe=sd(cpe)) ) ## species mean.cpe sd.cpe ## 1 BKT 2.000 1.000 ## 2 LKT 1.167 1.041 ## 3 RBT 1.000 1.000
As an example, you can make a histogram of the lengths of Brook Trout in the original data frame
with(subset(lens,species=="BKT"), hist(tl,xlab="Total Length",main="",col="gray90"))
or a barplot of the mean CPE by species
with(cpesum, barplot(mean.cpe,names.arg=species, ylab="Mean CPE",xlab="Species"))
Obviously, this is a toy example, but it can be scaled up to larger projects.
Filed under: Fisheries Science, R Tagged: Data Manipulation, plyr, R
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.