Extending existing packages: Rmisc
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
One of my favorite packages is Rmisc. The package includes the summarySE function which I use on a daily basis. The function provides a concise way to get a data frame with mean and standard errors of the mean. It is a great way in conjunction with ggplot to visually show differences between groups. Let’s have a look at a play example; we create a data set, aggregate it with Rmisc::summarySE() and plot the results with ggplot2.
Pretty straight-forward. If you followed this blog, you might have noticed that it is my preferred way to compare statistics between groups. However there is one key drawback. The aggregation in this case is incorrect. It is a mistake to take the mean over a ratio (Bounce-rate) when the N (Sessions) varies over time. The correct way is to use the weighted mean, which yields a slightly different overall mean:
However, Rmisc does not provide the possibility to aggregate means with weights. While I ignored the issue for some time, last week I decided to “give-back” and add a weighted.summarySE function. I looked at the package at the repository, copied the original, and changed some lines. (Please see the full code at the end of the post.)
With that function it is again pretty straight-forward to create the chart.This time with the including the key improvement that the means are correct :).
While I still wait for Ryan to accept my pull request; I hope this post inspires you to give feedback or add some functions that you miss in existing packages. Happy extending!
Well ordered source code:
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.