Easy cell statistics for factorial designs
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A common task when analyzing multi-group designs is obtaining descriptive statistics for various cells and cell combinations.
There are many functions that can help you accomplish this, including aggregate() and by() in the base installation, summaryBy() in the doBy package, and describe.by() in the psych package. However, I find it easiest to use the melt() and cast() functions in the reshape package.
As an example, consider the mtcars dataframe (included in the base installation) containing road test information on automobiles assessed in 1974. Suppose that you want to obtain the means, standard deviations, and sample sizes for the variables miles per gallon (mpg), horsepower (hp), and weight (wt). You want these statistics for all cars in the dataset, separately by transmission type (am) and number of gears (gear), and for the cells formed by crossing these two variables.
You can accomplish this with the following code:
options(digits = 3) library(reshape) # define and name the statistics of interest stats <- function(x)(c(N = length(x), Mean = mean(x), SD = sd(x))) # label the levels of the classification variables (optional) mtcars$am <- factor(mtcars$am, levels = c(0, 1), labels = c("Automatic", "Manual")) mtcars$gear <- factor(mtcars$gear, levels = c(3, 4, 5), labels = c("3-Cyl", "4-Cyl", "5-Cyl")) # melt the dataset dfm <- melt(mtcars, # outcome variables measure.vars = c("mpg", "hp", "wt"), # classification variables id.vars = c("am", "gear")) # statistics for the entire sample cast(dfm, variable ~ ., stats) # statistics for cells defined by transmission type cast(dfm, am + variable ~ ., stats) # statistics for cells defined by number of gears cast(dfm, gear + variable ~ ., stats) # statistics for cells defined by each am x gear combination cast(dfm, am + gear + variable ~ ., stats)
The output is given below:
variable N Mean SD 1 mpg 32 20.09 6.027 2 hp 32 146.69 68.563 3 wt 32 3.22 0.978 am variable N Mean SD 1 Automatic mpg 19 17.15 3.834 2 Automatic hp 19 160.26 53.908 3 Automatic wt 19 3.77 0.777 4 Manual mpg 13 24.39 6.167 5 Manual hp 13 126.85 84.062 6 Manual wt 13 2.41 0.617 gear variable N Mean SD 1 3-Cyl mpg 15 16.11 3.372 2 3-Cyl hp 15 176.13 47.689 3 3-Cyl wt 15 3.89 0.833 4 4-Cyl mpg 12 24.53 5.277 5 4-Cyl hp 12 89.50 25.893 6 4-Cyl wt 12 2.62 0.633 7 5-Cyl mpg 5 21.38 6.659 8 5-Cyl hp 5 195.60 102.834 9 5-Cyl wt 5 2.63 0.819 am gear variable N Mean SD 1 Automatic 3-Cyl mpg 15 16.11 3.372 2 Automatic 3-Cyl hp 15 176.13 47.689 3 Automatic 3-Cyl wt 15 3.89 0.833 4 Automatic 4-Cyl mpg 4 21.05 3.070 5 Automatic 4-Cyl hp 4 100.75 29.010 6 Automatic 4-Cyl wt 4 3.30 0.157 7 Manual 4-Cyl mpg 8 26.27 5.414 8 Manual 4-Cyl hp 8 83.88 24.175 9 Manual 4-Cyl wt 8 2.27 0.461 10 Manual 5-Cyl mpg 5 21.38 6.659 11 Manual 5-Cyl hp 5 195.60 102.834 12 Manual 5-Cyl wt 5 2.63 0.819
The approach is easily generalized to any number of grouping variables (factors), dependent/outcome variables, and statistics, and gives you a powerful tool for slicing and dicing data.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.