Site icon R-bloggers

Power Curves in R Using Plotly ggplot2 Library

[This article was first published on R – Modern Data, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

When performing Student’s t-test to compare the difference in means between two groups, it is a useful exercise to determine the effect of unequal sample sizes in the comparison groups on power. Formally, power can be defined as the probability of rejecting the null hypothesis when the alternative hypothesis is true. Informally, power is the ability of a statistical test to detect an effect, if the effect actually exists. Large imbalances generally will not have adequate statistical power to detect even large effect sizes associated with a factor, leading to a high Type II error rate.

To jusity this reasoning I performed a power analysis for different group sizes. I considered the following group sizes, where n1 are the number of subjects in group 1 and n2 are the number of subjects in group 2:

  1. n1 = 28, n2 = 1406: n1 represents 2% of the entire sample size of 1434
  2. n1 = 144, n2 = 1290: n1 represents 10% of the entire sample size of 1434
  3. n1 = 287, n2 = 1147: n1 represents 20% of the entire sample size of 1434
  4. n1 = 430, n2 = 1004: n1 represents 30% of the entire sample size of 1434
  5. n1 = 574, n2 = 860: n1 represents 40% of the entire sample size of 1434
  6. n1 = 717, n2 = 717: equal size groups (this is optimal because it leads to the highest power for a given effect size)

In the figure below I plotted the power curves for the t-test, as a function of the effect size, assuming a Type I error rate of 5%. Comparing different power curves (based on the sample size of each group) on the same plot, is a useful visual representation of this analysis. We also plot a horizontal dashed line at an acceptable power level of 80%, and a vertical line at the effect size that would have to be present in our data to achieve 80% power. We see that the effect size must be greater than 0.54 to attain an acceptable power level given highly imbalanced group sizes of n1 = 28 and n2 = 1406, compared to all other scenarios that lead to 100% power.

library(pwr) # for power calcs
library(dplyr) # for data manipulation
library(tidyr) # for data manipulation
library(ggplot2) # for plotting power curves
library(plotly) # for interactive power curves

# Generate power calculations
ptab <- cbind(NULL, NULL)       

for (i in seq(0,1, length.out = 200)){
  pwrt1 <- pwr.t2n.test(n1 = 28, n2 = 1406, 
                        sig.level = 0.05, power = NULL, 
                        d = i, alternative="two.sided")
  pwrt2 <- pwr.t2n.test(n1 = 144, n2 = 1290, 
                        sig.level = 0.05, power = NULL, 
                        d = i, alternative="two.sided")
  pwrt3 <- pwr.t2n.test(n1 = 287, n2 = 1147, 
                        sig.level = 0.05, power = NULL, 
                        d = i, alternative="two.sided")
  pwrt4 <- pwr.t2n.test(n1 = 430, n2 = 1004, 
                        sig.level = 0.05, power = NULL, 
                        d = i, alternative="two.sided")
  pwrt5 <- pwr.t2n.test(n1 = 574, n2 = 860, 
                        sig.level = 0.05, power = NULL, 
                        d = i, alternative="two.sided")
  pwrt6 <- pwr.t2n.test(n1 = 717, n2 = 717, 
                        sig.level = 0.05, power = NULL, 
                        d = i, alternative="two.sided")
  ptab <- rbind(ptab, cbind(pwrt1$d, pwrt1$power,
                            pwrt2$d, pwrt2$power,
                            pwrt3$d, pwrt3$power,
                            pwrt4$d, pwrt4$power,
                            pwrt5$d, pwrt5$power,
                            pwrt6$d, pwrt6$power))
}

ptab <- cbind(seq_len(nrow(ptab)), ptab)

colnames(ptab) <- c("id","n1=28, n2=1406.effect size","n1=28, n2=1406.power",
                    "n1=144, n2=1290.effect size","n1=144, n2=1290.power",
                    "n1=287, n2=1147.effect size","n1=287, n2=1147.power",
                    "n1=430, n2=1004.effect size","n1=430, n2=1004.power",
                    "n1=574, n2=860.effect size","n1=574, n2=860.power",
                    "n1=717, n2=717.effect size","n1=717, n2=717.power")

# get data into right format for ggplot2
temp <- ptab %>%
  as.data.frame() %>%
  gather(key = name, value = val, 2:13) %>%
  separate(col = name, into = c("group", "var"), sep = "\.") %>%
  spread(key = var, value = val)

# factor group
temp$group <- factor(temp$group, 
                levels = c("n1=28, n2=1406", "n1=144, n2=1290", 
                "n1=287, n2=1147", "n1=430, n2=1004",
                "n1=574, n2=860", "n1=717, n2=717"))


# plot
p <- ggplot(temp, aes(x = `effect size`, y = power, color = group))
     geom_line(size=2) + 
     theme_bw() + 
     theme(axis.text=element_text(size=14), 
           axis.title=element_text(size=14), 
           legend.text=element_text(size=14)) +
     geom_vline(xintercept = .54, linetype = 2) +
     geom_hline(yintercept = 0.80, linetype = 2)

# so simple to make interactive plots
plotly::ggplotly(p)

To leave a comment for the author, please follow the link and comment on their blog: R – Modern Data.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.