A Function for Calculating Tidy Summaries of Multiple t-tests
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The t-test is one of the most often used in psychology and other social sciences. In APA format, researchers are told to report the means and standard deviations of both conditions; the t-statistic, its degrees of freedom, and its p-value; and an effect size with confidence interval (generally Cohen’s d and 95%).
Researchers frequently conduct randomized experiments with not just one dependent variable, but many. And they may want to make sure that other variables, such as age, do not differ by condition.
The following function will return all the necessary information from t-tests. I don’t have it in a package yet (and don’t know where I’d put it—yet. Drop me a line if you think it fits in an existing package; I would be happy to include it in an existing package). So you’ll have to copy, paste, and run the following into your script to use it.
t_table <- function(data, dvs, iv, var_equal = TRUE, p_adj = "none") { if (!inherits(data, "data.frame")) { stop("data must be a data.frame") } if (!all(c(dvs, iv) %in% names(data))) { stop("at least one column given in dvs and iv are not in the data") } if (!all(sapply(data[, dvs], is.numeric))) { stop("all dvs must be numeric") } if (length(unique(na.omit(data[[iv]]))) != 2) { stop("independent variable must only have two unique values") } out <- lapply(dvs, function(x) { tres <- t.test(data[[x]] ~ data[[iv]], var.equal = var_equal) mns <- tapply(data[[x]], data[[iv]], mean, na.rm = TRUE) names(mns) <- paste0(names(mns), "_m") sds <- tapply(data[[x]], data[[iv]], sd, na.rm = TRUE) names(sds) <- paste0(names(sds), "_sd") es <- MBESS::ci.smd(ncp = tres$statistic, n.1 = table(data[[iv]])[[1]], n.2 = table(data[[iv]])[[2]]) c( c(mns[1], sds[1], mns[2], sds[2]), tres$statistic, tres$parameter, p = tres$p.value, d = unname(es$smd), d_lb = es$Lower, d_ub = es$Upper ) }) out <- as.data.frame(do.call(rbind, out)) out <- cbind(variable = dvs, out) names(out) <- gsub("[^0-9A-Za-z_]", "", names(out)) out$p <- p.adjust(out$p, p_adj) return(out) }
The first argument specifies a data.frame
where the data reside, a string vector of the names of the dependent variables, a string indicating the independent variable, a logical value on whether or not to assume variances are equal across conditions (defaults to TRUE
for a classic t-test), and a string indicating what p-value adjustments to do. See ?p.adjust.methods
for more information on which methods are available to use. This defaults to no adjustment. (The function with full documentation in {roxygen2}
format can be found at my GitHub.) Note that this function depends on the {MBESS}
package, so make sure to have that installed first (but you don’t need to call library
on it).
What does it look like in action? Let’s imagine a ctl
and exp
condition, with the dependent variables of y1
, y2
, etc., through y10
, and a sample size of 128. I simulate that data below, where y1
and y2
have significant effects with a Cohen’s d = 0.5 and 0.8, respectively.
set.seed(1839) cond <- rep(c("ctl", "exp"), each = 64) y1 <- rnorm(128, ifelse(cond == "ctl", 0, .5)) y2 <- rnorm(128, ifelse(cond == "ctl", 0, .8)) dat <- as.data.frame(lapply(1:8, function(zzz) rnorm(128))) dat <- cbind(cond, y1, y2, dat) names(dat)[4:11] <- paste0("y", 3:10) dat[1:5, 1:5] ## cond y1 y2 y3 y4 ## 1 ctl 1.0127014 -1.6556888 2.61696223 0.3817117 ## 2 ctl -0.6845605 0.8893057 0.05602809 -1.6996460 ## 3 ctl 0.3492607 -0.4439924 0.33997464 0.8473431 ## 4 ctl -1.6245010 1.2612491 -0.99240679 -0.2083059 ## 5 ctl -0.5162476 0.2012927 -0.96291759 -0.2948407
We can then feed the necessary information to the t_table
function. Note that, instead of typing out all the y1
through y10
columns, I use the paste0
function to generate them in less code. I don’t do any rounding for you inside of the function, but for the sake of presentation, I round to 2 decimal points here.
result <- t_table(dat, paste0("y", 1:10), "cond") result[-1] <- lapply(result[-1], round, 2) result ## variable ctl_m ctl_sd exp_m exp_sd t df p d d_lb d_ub ## 1 y1 -0.07 0.96 0.26 0.90 -2.04 126 0.04 -0.36 -0.71 -0.01 ## 2 y2 0.05 1.02 0.82 0.91 -4.48 126 0.00 -0.79 -1.15 -0.43 ## 3 y3 0.07 1.11 -0.07 0.99 0.76 126 0.45 0.13 -0.21 0.48 ## 4 y4 0.00 1.04 -0.27 1.05 1.44 126 0.15 0.26 -0.09 0.60 ## 5 y5 0.06 0.91 -0.28 1.08 1.93 126 0.06 0.34 -0.01 0.69 ## 6 y6 0.05 1.03 0.06 1.09 -0.08 126 0.94 -0.01 -0.36 0.33 ## 7 y7 -0.01 0.96 -0.06 1.08 0.30 126 0.77 0.05 -0.29 0.40 ## 8 y8 -0.08 0.99 -0.18 0.98 0.56 126 0.58 0.10 -0.25 0.45 ## 9 y9 0.37 0.93 -0.18 1.05 3.13 126 0.00 0.55 0.20 0.91 ## 10 y10 -0.11 0.85 -0.21 0.94 0.59 126 0.56 0.10 -0.24 0.45
Note that we got a few false positives here. Which leads to using p-value adjustments, if you wish. Let’s now say I use the Holm adjustment.
result2 <- t_table(dat, paste0("y", 1:10), "cond", p_adj = "holm") result2[-1] <- lapply(result2[-1], round, 2) result2 ## variable ctl_m ctl_sd exp_m exp_sd t df p d d_lb d_ub ## 1 y1 -0.07 0.96 0.26 0.90 -2.04 126 0.35 -0.36 -0.71 -0.01 ## 2 y2 0.05 1.02 0.82 0.91 -4.48 126 0.00 -0.79 -1.15 -0.43 ## 3 y3 0.07 1.11 -0.07 0.99 0.76 126 1.00 0.13 -0.21 0.48 ## 4 y4 0.00 1.04 -0.27 1.05 1.44 126 0.91 0.26 -0.09 0.60 ## 5 y5 0.06 0.91 -0.28 1.08 1.93 126 0.39 0.34 -0.01 0.69 ## 6 y6 0.05 1.03 0.06 1.09 -0.08 126 1.00 -0.01 -0.36 0.33 ## 7 y7 -0.01 0.96 -0.06 1.08 0.30 126 1.00 0.05 -0.29 0.40 ## 8 y8 -0.08 0.99 -0.18 0.98 0.56 126 1.00 0.10 -0.25 0.45 ## 9 y9 0.37 0.93 -0.18 1.05 3.13 126 0.02 0.55 0.20 0.91 ## 10 y10 -0.11 0.85 -0.21 0.94 0.59 126 1.00 0.10 -0.24 0.45
But note that the width of the confidence intervals are not adjusted here.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.