Site icon R-bloggers

A Function for Calculating Tidy Summaries of Multiple t-tests

[This article was first published on Mark H. White II, PhD, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The t-test is one of the most often used in psychology and other social sciences. In APA format, researchers are told to report the means and standard deviations of both conditions; the t-statistic, its degrees of freedom, and its p-value; and an effect size with confidence interval (generally Cohen’s d and 95%).

Researchers frequently conduct randomized experiments with not just one dependent variable, but many. And they may want to make sure that other variables, such as age, do not differ by condition.

The following function will return all the necessary information from t-tests. I don’t have it in a package yet (and don’t know where I’d put it—yet. Drop me a line if you think it fits in an existing package; I would be happy to include it in an existing package). So you’ll have to copy, paste, and run the following into your script to use it.

t_table <- function(data, dvs, iv, var_equal = TRUE, p_adj = "none") {
  
  if (!inherits(data, "data.frame")) {
    stop("data must be a data.frame")
  }
  
  if (!all(c(dvs, iv) %in% names(data))) {
    stop("at least one column given in dvs and iv are not in the data")
  }
  
  if (!all(sapply(data[, dvs], is.numeric))) {
    stop("all dvs must be numeric")
  }
  
  if (length(unique(na.omit(data[[iv]]))) != 2) {
    stop("independent variable must only have two unique values")
  }
  
  out <- lapply(dvs, function(x) {
    
    tres <- t.test(data[[x]] ~ data[[iv]], var.equal = var_equal)
    
    mns <- tapply(data[[x]], data[[iv]], mean, na.rm = TRUE)
    names(mns) <- paste0(names(mns), "_m")
    
    sds <- tapply(data[[x]], data[[iv]], sd, na.rm = TRUE)
    names(sds) <- paste0(names(sds), "_sd")
    
    es <- MBESS::ci.smd(ncp = tres$statistic, 
                        n.1 = table(data[[iv]])[[1]], 
                        n.2 = table(data[[iv]])[[2]])
    
    c(
      c(mns[1], sds[1], mns[2], sds[2]),
      tres$statistic,
      tres$parameter,
      p = tres$p.value,
      d = unname(es$smd),
      d_lb = es$Lower,
      d_ub = es$Upper
    )
  })
  
  out <- as.data.frame(do.call(rbind, out))
  out <- cbind(variable = dvs, out)
  names(out) <- gsub("[^0-9A-Za-z_]", "", names(out))
  
  out$p <- p.adjust(out$p, p_adj)
  
  return(out)
}

The first argument specifies a data.frame where the data reside, a string vector of the names of the dependent variables, a string indicating the independent variable, a logical value on whether or not to assume variances are equal across conditions (defaults to TRUE for a classic t-test), and a string indicating what p-value adjustments to do. See ?p.adjust.methods for more information on which methods are available to use. This defaults to no adjustment. (The function with full documentation in {roxygen2} format can be found at my GitHub.) Note that this function depends on the {MBESS} package, so make sure to have that installed first (but you don’t need to call library on it).

What does it look like in action? Let’s imagine a ctl and exp condition, with the dependent variables of y1, y2, etc., through y10, and a sample size of 128. I simulate that data below, where y1 and y2 have significant effects with a Cohen’s d = 0.5 and 0.8, respectively.

set.seed(1839)
cond <- rep(c("ctl", "exp"), each = 64)
y1 <- rnorm(128, ifelse(cond == "ctl", 0, .5))
y2 <- rnorm(128, ifelse(cond == "ctl", 0, .8))
dat <- as.data.frame(lapply(1:8, function(zzz) rnorm(128)))
dat <- cbind(cond, y1, y2, dat)
names(dat)[4:11] <- paste0("y", 3:10)
dat[1:5, 1:5]
##   cond         y1         y2          y3         y4
## 1  ctl  1.0127014 -1.6556888  2.61696223  0.3817117
## 2  ctl -0.6845605  0.8893057  0.05602809 -1.6996460
## 3  ctl  0.3492607 -0.4439924  0.33997464  0.8473431
## 4  ctl -1.6245010  1.2612491 -0.99240679 -0.2083059
## 5  ctl -0.5162476  0.2012927 -0.96291759 -0.2948407

We can then feed the necessary information to the t_table function. Note that, instead of typing out all the y1 through y10 columns, I use the paste0 function to generate them in less code. I don’t do any rounding for you inside of the function, but for the sake of presentation, I round to 2 decimal points here.

result <- t_table(dat, paste0("y", 1:10), "cond")
result[-1] <- lapply(result[-1], round, 2)
result
##    variable ctl_m ctl_sd exp_m exp_sd     t  df    p     d  d_lb  d_ub
## 1        y1 -0.07   0.96  0.26   0.90 -2.04 126 0.04 -0.36 -0.71 -0.01
## 2        y2  0.05   1.02  0.82   0.91 -4.48 126 0.00 -0.79 -1.15 -0.43
## 3        y3  0.07   1.11 -0.07   0.99  0.76 126 0.45  0.13 -0.21  0.48
## 4        y4  0.00   1.04 -0.27   1.05  1.44 126 0.15  0.26 -0.09  0.60
## 5        y5  0.06   0.91 -0.28   1.08  1.93 126 0.06  0.34 -0.01  0.69
## 6        y6  0.05   1.03  0.06   1.09 -0.08 126 0.94 -0.01 -0.36  0.33
## 7        y7 -0.01   0.96 -0.06   1.08  0.30 126 0.77  0.05 -0.29  0.40
## 8        y8 -0.08   0.99 -0.18   0.98  0.56 126 0.58  0.10 -0.25  0.45
## 9        y9  0.37   0.93 -0.18   1.05  3.13 126 0.00  0.55  0.20  0.91
## 10      y10 -0.11   0.85 -0.21   0.94  0.59 126 0.56  0.10 -0.24  0.45

Note that we got a few false positives here. Which leads to using p-value adjustments, if you wish. Let’s now say I use the Holm adjustment.

result2 <- t_table(dat, paste0("y", 1:10), "cond", p_adj = "holm")
result2[-1] <- lapply(result2[-1], round, 2)
result2
##    variable ctl_m ctl_sd exp_m exp_sd     t  df    p     d  d_lb  d_ub
## 1        y1 -0.07   0.96  0.26   0.90 -2.04 126 0.35 -0.36 -0.71 -0.01
## 2        y2  0.05   1.02  0.82   0.91 -4.48 126 0.00 -0.79 -1.15 -0.43
## 3        y3  0.07   1.11 -0.07   0.99  0.76 126 1.00  0.13 -0.21  0.48
## 4        y4  0.00   1.04 -0.27   1.05  1.44 126 0.91  0.26 -0.09  0.60
## 5        y5  0.06   0.91 -0.28   1.08  1.93 126 0.39  0.34 -0.01  0.69
## 6        y6  0.05   1.03  0.06   1.09 -0.08 126 1.00 -0.01 -0.36  0.33
## 7        y7 -0.01   0.96 -0.06   1.08  0.30 126 1.00  0.05 -0.29  0.40
## 8        y8 -0.08   0.99 -0.18   0.98  0.56 126 1.00  0.10 -0.25  0.45
## 9        y9  0.37   0.93 -0.18   1.05  3.13 126 0.02  0.55  0.20  0.91
## 10      y10 -0.11   0.85 -0.21   0.94  0.59 126 1.00  0.10 -0.24  0.45

But note that the width of the confidence intervals are not adjusted here.

To leave a comment for the author, please follow the link and comment on their blog: Mark H. White II, PhD.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.