Aggregate Function in R: Making your life easier, one mean at a time

hayward

11 years ago

[This article was first published on Psychwire » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I previously posted about calculating medians using R. I used tapply to do it, but I’ve since found something that feels easier to use (at least to me).

^?Download download.txt

1
2
3

aggregated_output = aggregate(DV ~ IV1 * IV2,
                data=data_to_aggregate, FUN=median)
aggregated_output

The above code saves an aggregated dataset to aggregated_output and gives you the median in a column. The median (or mean, or whatever function you want to apply) is specified by FUN=. The value to create a median for is specified by DV (dependent variable).

The aggregate function also gives additional columns for each IV (independent variable). You can have as many of these as you like. Here, I have two, and these are specified by IV1 * IV2.

Those of you who are familiar with relational databases will see immediately that this function is somewhat similar to GROUP BY (in MySQL). The bonus is that you don’t need to SELECT the IV columns that you want to be provided; those are done automatically. For example, take a look at this:

^?Download download.txt

1	SELECT IV1, IV2, AVG(DV) FROM data_to_aggregate GROUP BY IV1, IV2

There is apparently more than one ways to skin a cat (even if it’s a cat that’s made of data).

< !-- Social Bookmarks BEGIN -->

Bookmark It

Hide Sites

< !-- Social Bookmarks END -->

To leave a comment for the author, please follow the link and comment on their blog: Psychwire » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.