Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In this post we are going to explore the relationship between sample size (n) and statistical significance for the chi-squared (
where z is the test statistic and:
where s is the sample standard deviation. Typically our null is zero in which case we reject the null hypothesis when the confidence does not span zero. If we wish to construct a 95% confidence interval, then
# Define some parameters sig_level <- .95 # Significance level, 95% here es <- 0.15 # Effect size in standard units null_val <- 0 # null value #' Calculate the standard error #' #' This function will calculate the standard error from a vector of observations or with a given #' sample standard deviation and sample size. #' #' @param x numeric vector of observations. #' @param sigma the sample standard deviation. #' @param n sample size. standard_error <- function(x, sigma = sd(x), n = length(x)) { if(!missing(x)) { # Some basic error checking if(sigma != sd(x)) { warning('The sample standard deviation (sigma) is not equal to sd(x)')} if(n != length(x)) { warning('The sample size (n) is not equal to length(x).' )} } return(sigma / sqrt(n)) } # Create a data.frame with varying sample sizes and the corresponding standard error df <- data.frame( n = 10:400, se = standard_error(sigma = 1, n = 10:400) ) cv <- abs(qnorm((1 - sig_level) / 2)) # Critical value (z test statistic) df$ci_low <- es - cv * df$se df$ci_high <- es + cv * df$se df$sig <- null_val < df$ci_low | null_val > df$ci_high min_n <- df$n[df$sig] |> min() ggplot(df, aes(x = n, y = se, color = sig)) + geom_path() + geom_point() + scale_color_brewer(paste0('p < ', (1 - sig_level)), type = 'qual', palette = 6) + ggtitle(paste0('Minumum n for p < ', (1 - sig_level), ': ', min_n), subtitle = paste0('effect size: ', es, '; null value: ', null_val))
The chi-squared (
where O is the observed count and E is the expected count. Unlike the standard error for numerical data, n is not explicitly in the formula and therefore makes it a bit more challenging to determine the impact sample size has rejecting the null hypothesis. Moreover, since the chi-squared is calculated from the cell counts in a table of varying length and dimension (one- or two-dimensions specifically) determining how n impacts rejecting the null or not requires more parameters.
Answering the question of how large does n need to be to detect a statistically significant result (i.e. to reject the null hypothesis) is refereed to as power. Whereas for calculating the power for numerical data had one parameter, the sample standard deviation, here we need to consider the proportion of observations within different cells. For example, consider we have a variable with three levels and we expect the proportion of observations in the three groups to be 33%, 25%, and 42%, respectively. If our sample size is 100 then we expect there to be 33, 25, and 42 and observations for the three categories. This function will, for varying sample sizes, calculate the counts for the categories to achieve that sample size, estimate the chi-squared statistic and record the p-value. There are other parameters that are documented below. A plot
function is also defined using the S3 objected oriented framework.
#' Calculate p-value from a chi-squared test with varying sample sizes #' #' This algorithm will start with an initial sample size (`n_start`) and perform a chi-squared test #' with a vector of counts equal to `n * probs`. This will repeat increasing the sample size by #' `n_step` until the p-value from the chi-squared test is less than `p_stop`. #' #' @param vector of cell probabilities. The sum of the values must equal 1. #' @param sig_level significance level. #' @param p_stop the p-value to stop estimating chi-squared tests. #' @param max_n maximum n to attempt if `p_value` is never less than `p_stop`. #' @param min_cell_size minimum size per cell to perform the chi-square test. #' @param n_start the starting sample size. #' @param n_step the increment for each iteration. #' @return a data.frame with three columns: n (sample size), p_value, and sig (TRUE if #' p_value < sig_level). #' @importFrom DescTools power.chisq.test CramerV chi_squared_power <- function( probs, sig_level = 0.05, p_stop = 0.01, power = 0.80, power_stop = 0.90, max_n = 100000, min_cell_size = 10, n_start = 10, n_step = 10 ) { if(sum(probs) != 1) { # Make sure the sum is equal to 1 stop('The sum of the probabilities must equal 1.') } else if(length(unique(probs)) == 1) { stop('All the probabilities are equal.') } n <- n_start p_values <- numeric() power_values <- numeric() df <- ifelse(is.vector(probs), length(probs) - 1, min(dim(probs)) - 1) # Degrees of freedom repeat { x <- (probs * n) |> round() if(all(x > min_cell_size)) { cs <- chisq.test(x, rescale.p = TRUE, simulate.p.value = FALSE) p_values[n / n_step] <- cs$p.value pow <- DescTools::power.chisq.test( n = n, w = DescTools::CramerV(as.table(x)), df = df, sig.level = sig_level ) power_values[n / n_step] <- pow$power if((cs$p.value < p_stop & pow$power > power_stop) | n > max_n) { break; } } else { p_values[n / n_step] <- NA power_values[n / n_step] <- NA } n <- n + n_step } result <- data.frame(n = seq(10, length(p_values) * n_step, n_step), p_value = p_values, sig = p_values < sig_level, power = power_values) class(result) <- c('chisqpower', 'data.frame') attr(result, 'probs') <- probs attr(result, 'sig_level') <- sig_level attr(result, 'p_stop') <- p_stop attr(result, 'power') <- power attr(result, 'power_stop') <- power_stop attr(result, 'max_n') <- max_n attr(result, 'n_step') <- n_step return(result) } #' Plot the results of chi-squared power estimation #' #' @param x result of [chi_squared_power()]. #' @param plot_power whether to plot the power curve. #' @param plot_p whether to plot p-values. #' @param digits number of digits to round to. #' @param segement_color color of the lines marking where power and p values exceed threshold. #' @param sgement_linetype linetype of the lines marking where power and p values exceed threshold. #' @param p_linetype linetype for the p-values. #' @param power_linetype linetype for the power values. #' @param title plot title. If missing a title will be automatically generated. #' @parma ... currently not used. #' @return a ggplot2 expression. plot.chisqpower <- function( x, plot_power = TRUE, plot_p = TRUE, digits = 4, segment_color = 'grey60', segment_linetype = 1, p_linetype = 1, power_linetype = 2, title, ... ) { pow <- attr(x, 'power') p <- ggplot(x[!is.na(x$p_value),], aes(x = n, y = p_value)) if(plot_power) { if(any(x$power > pow, na.rm = TRUE)) { min_n_power <- min(x[x$power > pow,]$n, na.rm = TRUE) p <- p + geom_segment( x = 0, xend = min_n_power, y = pow, yend = pow, color = segment_color, linetype = segment_linetype) + ggplot2::annotate( geom = 'text', x = 0, y = pow, label = paste0('Power = ', pow), vjust = -1, hjust = 0) + geom_segment( x = min_n_power, xend = min_n_power, y = pow, yend = 0, color = segment_color, linetype = segment_linetype) + ggplot2::annotate( geom = 'text', x = min_n_power, y = 0, label = paste0('n = ', prettyNum(min_n_power, big.mark = ',')), vjust = 1, hjust = -0.1) } p <- p + geom_path( aes(y = power), color = '#7570b3', linetype = power_linetype) } if(plot_p) { if(any(x$sig, na.rm = TRUE)) { p <- p + geom_segment( x = 0, xend = min(x[x$sig,]$n, na.rm = TRUE), y = attr(x, 'sig_level'), yend = attr(x, 'sig_level'), color = segment_color, linetype = segment_linetype) + ggplot2::annotate( geom = 'text', x = 0, y = attr(x, 'sig_level'), label = paste0('p = ', attr(x, 'sig_level')), vjust = -1, hjust = 0) + geom_segment( x = min(x[x$sig,]$n, na.rm = TRUE), xend = min(x[x$sig,]$n, na.rm = TRUE), y = attr(x, 'sig_level'), yend = 0, color = segment_color, linetype = segment_linetype) + ggplot2::annotate( geom = 'text', x = min(x[x$sig,]$n, na.rm = TRUE), y = 0, label = paste0('n = ', prettyNum(min(x[x$sig,]$n, na.rm = TRUE), big.mark = ',')), vjust = 1, hjust = -0.1) } p <- p + geom_path( alpha = 0.7, linetype = p_linetype) # geom_point(aes(color = sig), size = 1) + # scale_color_brewer(paste0('p < ', attr(x, 'sig_level')), type = 'qual', palette = 6) } if(missing(title)) { if(any(x$power > pow, na.rm = TRUE) & any(x$sig, na.rm = TRUE)) { min_n <- min(x[x$sig & x$power > pow,]$n, na.rm = TRUE) title <- paste0('Smallest n where p < ', attr(x, 'sig_level'), ' and power > ', pow, ': ', prettyNum(min_n, big.mark = ',')) } else { title <- paste0('No n found where p < ', attr(x, 'sig_level'), ' and power > ', pow) } } p <- p + ylim(c(0, 1)) + ylab('') + xlab('Sample Size') + ggtitle(title, subtitle = paste0('Probabilities: ', paste0(round(attr(x, 'probs'), digits = digits), collapse = ', '))) return(p) }
Returning to our example above where the cell proportions are 33%, 25%, and 42%, we would need
csp1 <- chi_squared_power(probs = c(.33, .25, .42)) csp1[csp1$sig,]$n |> min(na.rm = TRUE) # Smallest n that results in p < 0.05
[1] 130
plot(csp1)
In the next example we have much smaller differences between the cells with 25%, 25%, 24%, and 26%. In this example
csp3 <- chi_squared_power(probs = c(.25, .25, .24, .26), max_n = 20000) csp3[csp3$sig,]$n |> min(na.rm = TRUE) # Smallest n that results in p < 0.05
[1] 9710
plot(csp3)
This function will work with two-dimensional data as well (i.e. across two variables). The following example from Agresti (2007) looks at the political affiliation across sex (see the help documentation for chisq.test()
.).
M <- as.table(rbind(c(762, 327, 468), c(484, 239, 477))) dimnames(M) <- list(gender = c("Femal", "Male"), party = c("Democrat", "Independent", "Republican")) M
party gender Democrat Independent Republican Femal 762 327 468 Male 484 239 477
sum(M)
[1] 2757
The chi-squared test suggests we should reject the null hypothesis test.
chisq.test(M)
Pearson's Chi-squared test data: M X-squared = 30.07, df = 2, p-value = 2.954e-07
DescTools::CramerV(M) # Effect size
[1] 0.1044358
DescTools::power.chisq.test(n = sum(M), w = DescTools::CramerV(M), df = min(dim(M)) - 1, sig.level = 1 - sig_level)
Chi squared power calculation w = 0.1044358 n = 2757 df = 1 sig.level = 0.05 power = 0.9997872 NOTE: n is the number of observations
Agresti had a sample size of 2757, but we can ask the question what is the minimum sample size would they need to detect statistical significance? First, we convert the counts to proportions, then we can use the chi_squared_power()
function to find the minimum sample size to reject the null hypothesis test.
M_prob <- M / sum(M) # Convert the counts to percentages csp4 <- chi_squared_power(probs = M_prob) plot(csp4)
For a more robust application for estimating power for many statistical tests, check out the pwsrr R package and corresponding Shiny application.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.