Which functions in plyr do people use?
This is the question that Hadley Wickham recently set out to discovering by asking frequent R and plyr users how they use it in an online survey.
Once a decent number of people have responded, Hadley quickly went forward and produced a short analysis of the plyr usage survey, and published it in RPubs. With his permission, I am re-posting his analysis here:
Plyr usage survey results
Thanks to everyone who took part! I recieved 124 responses in about 24 hours, which is super awesome. This document gives a quick writeup of the results.Function usage
Overall, function usage was much as I expected:ddply
is by far the most commonly used function followed by ldply
and dlply
, then llply
. This is reassuring because for the next iteration of plyr
, I’m planning to focus on ddply
, ldply
and dlply
.
Other functions
I didn’t perform a formal analysis of the free text “other functions”, but common themes were:- parallelisation
- progress bars
join
mutate
,summarise
,arrange
colwise
count
rbind.fill
Comments
Again, no formal analysis, but the common themes were:- You like plyr – thanks!
- Make plyr faster – this is a big motivation for the next iteration, and initial explorations are promising: I should be able to get a 10-100x speedup for many cases.
- Documentation and examples could be better – I know, but good documentation is hard!
- summarise now works sequentially (i.e. you can refer to columns you just created)
- there’s a new progress bar (thanks to Mike Lawrence) that estimates the amount of time remaining
- a new
here
function makes it easier to useddply
+summarise
/mutate
/subset
inside a function