Grouped means (or anything else…)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
An easy one today, but something that stumped me for a while* the first time I tried it out.
How do you get a group mean (or other summary statistic) from R? Lets say you have a Y variable that represents repetitions for each of however many factors.
You could subset the data by each combination of the X variables. Something like
trt1alt1 <- mean(data$Y[data$trt==1&data$alt==1,]) trt1alt2 <- mean(data$Y[data$trt==1&data$alt==2,]) trt1alt3 <- mean(data$Y[data$trt==1&data$alt==3,]) trt2alt1 <- mean(data$Y[data$trt==2&data$alt==1,]) trt2alt2 <- mean(data$Y[data$trt==2&data$alt==2,]) trt2alt3 <- mean(data$Y[data$trt==2&data$alt==3,]) ...
would do the trick. But thats daft. For one thing it takes a long time to type or edit, especially if you have a lot of groups.
The better way it to use aggregate.
aggregate(data$Y, by = list(trt = data$trt, alt = data$alt), FUN=mean)
this outputs a table with the levels of the variables and the Y variable. No fuss, no bother. If you need to include other arguments to mean, such as its na.rm argument, thats possible too…
aggregate(data$Y, by = list(trt = data$trt, alt = data$alt), FUN=mean, na.rm=TRUE)
Aggregate can also be applied to other functions, custom built or otherwise. There are also other options, such as the data.table or ddply packages. Some of the apply functions can also do the simple single level stuff too.
* I say a while….I mean an hour or so…so not long at all.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.