Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
For the final chapter of my dissertation, I had examined the effects of ordinal class rank on the academic achievement of Danish primary school students (following a popular identification strategy introduced in a paper by Murphy and Weinhard). Because of the richness of the Danish register data, I had a large number of potential outcome variables at my disposal, and as a result, I was able to examine literally all the outcomes that the previous literature had covered in individual studies – the effect of rank on achievement, personality, risky behaviour, mental health, parental investment, etc. – in one paper.
But with (too) many outcome variables comes a risk: inflated type 1 error rates, or an increased risk of ‘false positives’. So I was wondering: were all the significant effects I estimated – shown in the figure above – “real”, or was I simply being fooled by randomness?
A common way to control the risk of false positive when testing multiple hypothesis is to use methods that control the ‘family-wise’ error rate, i.e. the risk of at least one false positive in a family of S hypotheses.
Among such methods, Romano and Wolf’s correction is particularly popular, because it is “uniformly the most powerful”. Without going into too much detail, I’ll just say that if you have a choice between a number of methods that control the family-wise error rate at a desired level \(\alpha\), you might want to choose the “most powerful” one, i.e. the one that has the highest success rate of finding a true effect.
Now, there is actually an amazing Stata package for the Romano-Wolf method called rwolf
.
But this is an R blog, and I’m an R guy … In addition, my regression involved several million rows and some high-dimensional fixed effects, and rwolf
quickly showed its limitations. It just wasn’t fast enough!
While playing around with the rwolf
package, I finally did my due diligence on the method it implements, and after a little background reading, I realized that both the Romano and Wolf method – as well as its main rival, the method proposed by Westfall and Young – are based on the bootstrap!
But wait! Had I not just spent a lot of time porting a super-fast bootstrap algorithm from R to Stata? Could I not use Roodman et al’s “fast and wild” cluster bootstrap algorithm for bootstrap-based multiple hypothesis correction?
Of course it was inevitable: I ended up writing an R package. Today I am happy to present the first MVP version of the wildwrwolf
package!
The wildrwolf package
You can simply install the package from github or r-universe by typing
# install.packages("devtools") devtools::install_github("s3alfisc/wildrwolf") # from r-universe (windows & mac, compiled R > 4.0 required) install.packages('wildrwolf', repos ='https://s3alfisc.r-universe.dev')
The Romano Wolf correction in wildrwolf
is implemented as a post-estimation commands for multiple estimation objects from the fabulous fixest
package.
To demonstrate how wildrwolf's
main function, rwolf
, works, let’s first simulate some data:
library(wildrwolf) library(fixest) set.seed(1412) library(wildrwolf) library(fixest) set.seed(1412) N <- 5000 X1 <- rnorm(N) X2 <- rnorm(N) rho <- 0.5 sigma <- matrix(rho, 4, 4); diag(sigma) <- 1 u <- MASS::mvrnorm(n = N, mu = rep(0, 4), Sigma = sigma) Y1 <- 1 + 1 * X1 + X2 Y2 <- 1 + 0.01 * X1 + X2 Y3 <- 1 + 0.4 * X1 + X2 Y4 <- 1 + -0.02 * X1 + X2 for(x in 1:4){ var_char <- paste0("Y", x) assign(var_char, get(var_char) + u[,x]) } group_id <- sample(1:100, N, TRUE) data <- data.frame(Y1 = Y1, Y2 = Y2, Y3 = Y3, Y4 = Y4, X1 = X1, X2 = X2, group_id = group_id, splitvar = sample(1:2, N, TRUE))
We now estimate a regression via the fixest
package:
fit <- feols(c(Y1, Y2, Y3, Y4) ~ csw(X1,X2), data = data, cluster = ~group_id, ssc = ssc(cluster.adj = TRUE)) # clean workspace except for res & data rm(list= ls()[!(ls() %in% c('fit','data'))])
For all eight estimated regressions, we want to apply the Romano-Wolf correction to the parameter of interest, X1
. We simply need to provide the regression object of type fixest_multi
(it is also possible to simply provide a list of objects of type fixest
), the parameter of interest, the number of bootstrap draws, and possibly a seed (for replicability). That’s it! If the regressions use clustered standard errors, rwolf
will pick this up and run a wild cluster bootstrap, otherwise just a robust wild bootstrap.
pracma::tic() res_rwolf <- wildrwolf::rwolf( models = fit, param = "X1", B = 9999, seed = 23 ) ## | | | 0% | |========= | 12% | |================== | 25% | |========================== | 38% | |=================================== | 50% | |============================================ | 62% | |==================================================== | 75% | |============================================================= | 88% | |======================================================================| 100% pracma::toc() ## elapsed time is 3.980000 seconds
For \(N=5000\) observations with \(G=100\) clusters, \(S=8\) hypothesis and \(B=9999\) bootstrap draws, the calculation of Romano-Wolf corrected p-values takes less than 5 seconds. If you ask me, that is pretty fast! =) 🚀
We can now compare the results of rwolf
with the uncorrected p-values and another popular multiple testing adjustment, Holm’s method.
pvals <- lapply(fit, function(x) pvalue(x)["X1"]) |> unlist() df <- data.frame( "uncorrected" = pvals, "Holm" = p.adjust(pvals, method = "holm"), "rwolf" = res_rwolf$pval ) rownames(df) <- NULL round(df, 3) ## uncorrected Holm rwolf ## 1 0.000 0.000 0.000 ## 2 0.000 0.000 0.000 ## 3 0.140 0.420 0.367 ## 4 0.033 0.132 0.128 ## 5 0.000 0.000 0.000 ## 6 0.000 0.000 0.000 ## 7 0.398 0.420 0.394 ## 8 0.152 0.420 0.367
Both Holm’s method and rwolf
produce similar corrected p-values, which – in general – are larger than the uncorrected p-values.
But does it actually work? Monte Carlo Experiments
We test \(S=6\) hypotheses and generate data as
\[Y_{i,s,g} = \beta_{0} + \beta_{1,s} D_{i} + u_{i,g} + \epsilon_{i,s} \] where \(D_i = 1(U_i > 0.5)\) and \(U_i\) is drawn from a uniform distribution, \(u_{i,g}\) is a cluster level shock with intra-cluster correlation \(0.5\), and the idiosyncratic error term is drawn from a multivariate random normal distribution with mean \(0_S\) and covariance matrix
S <- 6 rho <- 0.5 Sigma <- matrix(1, 6, 6) diag(Sigma) <- rho Sigma
This experiment imposes a data generating process as in equation (9) in Clarke, Romano and Wolf, with an additional error term \(u_g\) for \(G=20\) clusters and intra-cluster correlation 0.5 and \(N=1000\) observations.
You can run the simulations via the run_fwer_sim()
function attached
in the package.
# note that this will take some time res <- run_fwer_sim( seed = 232123, n_sims = 500, B = 499, N = 1000, s = 6, rho = 0.5 #correlation between hypotheses, not intra-cluster! ) # > res # reject_5 reject_10 rho # fit_pvalue 0.274 0.460 0.5 # fit_pvalue_holm 0.046 0.112 0.5 # fit_padjust_rw 0.052 0.110 0.5
Both Holm’s method and wildrwolf
control the family wise error rates, at both the 5 and 10% significance level. Very cool!
The method by Westfall and Young
A package for Westfall and Young’s (1993) method is currently in development. I am not yet fully convinced that it works as intented – in simulations, I regularly find that wildwyoung
overrejects.
Literature
Clarke, Damian, Joseph P. Romano, and Michael Wolf. “The Romano–Wolf multiple-hypothesis correction in Stata.” The Stata Journal 20.4 (2020): 812-843.
Murphy, Richard, and Felix Weinhardt. “Top of the class: The importance of ordinal rank.” The Review of Economic Studies 87.6 (2020): 2777-2826.
Romano, Joseph P., and Michael Wolf. “Stepwise multiple testing as formalized data snooping.” Econometrica 73.4 (2005): 1237-1282.
Roodman, David, et al. “Fast and wild: Bootstrap inference in Stata using boottest.” The Stata Journal 19.1 (2019): 4-60.
Westfall, Peter H., and S. Stanley Young. Resampling-based multiple testing: Examples and methods for p-value adjustment. Vol. 279. John Wiley & Sons, 1993.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.