Split, Apply, and Combine for ffdf
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Call me incompetent, but I just can’t get ffdfdply to work with my ffdf dataframes. I’ve tried repeatedly and it just doesn’t seem to work! I’ve seen numerous examples on stackoverflow, but maybe I’m applying them incorrectly. Wanting to do some split-apply-combine on an ffdf, yet again, I finally broke down and made my own function that seems to do the job! It’s still crude, I think, and it will probably break down when there are NA values in the vector that you want to split, but here it is:
mtapply = function (dvar, ivar, funlist) { lenlist = length(funlist) outtable = matrix(NA, dim(table(ivar)), lenlist, dimnames=list(names(table(ivar)), funlist)) c = 1 for (f in funlist) { outtable[,c] = as.matrix(tapply(dvar, ivar, eval(parse(text=f)))) c = c + 1 } return (outtable)}
As you can see, I’ve made it so that the result is a bunch of tapply vectors inserted into a matrix. ”dvar”, unsurprisingly, is your dependent variable. ”ivar”, your independent variable. ”funlist” is a vector of function names typed in as strings (e.g. c(“median”,”mean”,”mode”). I’ve wasted so much of my time trying to get ddply or ffdfdply to work on an ffdf, that I’m happy that I now have anything that does the job for me.
Now that I think about it, this will fall short if you ask it to output more than one quantile for each of your split levels. If you can improve this function, please be my guest!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.