Parallelizing simr::powercurve() in R

[This article was first published on R on Pablo Bernabeu, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The powercurve function from the simr package in R (Green & MacLeod, 2016) can incur very long running times when the method used for the calculation of p values is Kenward-Roger or Satterthwaite (see Luke, 2017). Here I suggest three ways for cutting down this time.

  1. Where possible, use a high-performance (or high-end) computing cluster. This removes the need to use personal computers for these long jobs.

  2. In case you’re using the fixed() parameter of the powercurve function, and calculating the power for different effects, run these at the same time (‘in parallel’) on different machines, rather than one after another.

  3. Parallelize the breaks argument. The breaks argument of the powercurve function allows the calculation of power for different levels of the grouping factor passed to along. Some grouping factors are participant, trial and item. The breaks argument sets the different sample sizes for which power will be calculated. Parallelizing breaks is done by running each number of levels in a separate function. When each has been run and saved, they are combined to allow the plotting. This procedure is demonstrated below.

Parallelizing breaks

Let’s do a minimal example using a toy lmer model. A power curve will be craeted for the fixed effect of x along different sample sizes of the grouping factor g.

Notice that the six sections of the power curve below are serially arranged, one after another. In contrast, to enable parallel processing, each power curve would be placed in a single script, and they would all be run at the same time.

Although the power curves below run in a few minutes, the settings that are often used (e.g., a larger model; fixed('x', 'sa') instead of fixed('x'); nsim = 500 instead of nsim = 50) take far longer. That is where parallel processing becomes useful.1

library(lme4)
library(simr)


# Toy model
fm = lmer(y ~ x + (x | g), data = simdata)

# Extend sample size of `g`
fm_extended_g = extend(fm, along = 'g', n = 12)

# Parallelize `breaks` by running each number of levels in a separate function.

# 4 levels of g
pwcurve_4g = powerCurve(fm_extended_g, fixed('x'), along = 'g', breaks = 4, 
                        nsim = 50, seed = 123, 
                        # No progress bar
                        progress = FALSE)

# 6 levels of g
pwcurve_6g = powerCurve(fm_extended_g, fixed('x'), along = 'g', breaks = 6, 
                        nsim = 50, seed = 123, 
                        # No progress bar
                        progress = FALSE)

# 8 levels of g
pwcurve_8g = powerCurve(fm_extended_g, fixed('x'), along = 'g', breaks = 8, 
                        nsim = 50, seed = 123, 
                        # No progress bar
                        progress = FALSE)

# 10 levels of g
pwcurve_10g = powerCurve(fm_extended_g, fixed('x'), along = 'g', breaks = 10, 
                        nsim = 50, seed = 123, 
                        # No progress bar
                        progress = FALSE)

# 12 levels of g
pwcurve_12g = powerCurve(fm_extended_g, fixed('x'), along = 'g', breaks = 12, 
                        nsim = 50, seed = 123, 
                        # No progress bar
                        progress = FALSE)


Having saved each section of the power curve, we must now combine them to be able to plot them together.

# Create a destination object using any of the power curves above.
all_pwcurve = pwcurve_4g

# Combine results
all_pwcurve$ps = c(pwcurve_4g$ps[1], pwcurve_6g$ps[1], pwcurve_8g$ps[1], 
                   pwcurve_10g$ps[1], pwcurve_12g$ps[1])

# Combine the different numbers of levels.
all_pwcurve$xval = c(pwcurve_4g$nlevels, pwcurve_6g$nlevels, pwcurve_8g$nlevels, 
                     pwcurve_10g$nlevels, pwcurve_12g$nlevels)

print(all_pwcurve)
## Power for predictor 'x', (95% confidence interval),
## by number of levels in g:
##       4: 46.00% (31.81, 60.68) - 40 rows
##       6: 74.00% (59.66, 85.37) - 60 rows
##       8: 92.00% (80.77, 97.78) - 80 rows
##      10: 98.00% (89.35, 99.95) - 100 rows
##      12: 100.0% (92.89, 100.0) - 120 rows
## 
## Time elapsed: 0 h 0 m 7 s
plot(all_pwcurve, xlab = 'Levels of g')

# For reproducibility purposes
sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] simr_1.0.5    lme4_1.1-27.1 Matrix_1.3-4 
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.7        lattice_0.20-44   tidyr_1.1.3       assertthat_0.2.1 
##  [5] digest_0.6.27     utf8_1.2.2        plyr_1.8.6        R6_2.5.1         
##  [9] cellranger_1.1.0  backports_1.2.1   evaluate_0.14     blogdown_1.5     
## [13] pillar_1.6.2      rlang_0.4.11      curl_4.3.2        readxl_1.3.1     
## [17] minqa_1.2.4       data.table_1.14.0 car_3.0-11        nloptr_1.2.2.2   
## [21] jquerylib_0.1.4   rmarkdown_2.11    splines_4.1.1     stringr_1.4.0    
## [25] foreign_0.8-81    broom_0.7.9       compiler_4.1.1    xfun_0.26        
## [29] pkgconfig_2.0.3   mgcv_1.8-36       htmltools_0.5.2   tidyselect_1.1.1 
## [33] tibble_3.1.4      binom_1.1-1       bookdown_0.24     rio_0.5.27       
## [37] fansi_0.5.0       crayon_1.4.1      dplyr_1.0.7       MASS_7.3-54      
## [41] grid_4.1.1        nlme_3.1-153      jsonlite_1.7.2    lifecycle_1.0.0  
## [45] DBI_1.1.1         magrittr_2.0.1    zip_2.2.0         stringi_1.7.4    
## [49] carData_3.0-4     bslib_0.3.0       ellipsis_0.3.2    vctrs_0.3.8      
## [53] generics_0.1.0    boot_1.3-28       openxlsx_4.2.4    iterators_1.0.13 
## [57] tools_4.1.1       forcats_0.5.1     glue_1.4.2        purrr_0.3.4      
## [61] hms_1.1.0         plotrix_3.8-2     parallel_4.1.1    abind_1.4-5      
## [65] pbkrtest_0.5.1    fastmap_1.1.0     yaml_2.2.1        RLRsim_3.1-6     
## [69] knitr_1.34        haven_2.4.3       sass_0.4.0


Just the code

library(lme4)
library(simr)

# Toy model
fm = lmer(y ~ x + (x | g), data = simdata)

# Extend sample size of `g`
fm_extended_g = extend(fm, along = 'g', n = 12)

# Parallelize `breaks` by running each number of levels in a separate function.

# 4 levels of g
pwcurve_4g = powerCurve(fm_extended_g, fixed('x'), along = 'g', breaks = 4, 
                        nsim = 50, seed = 123, 
                        # No progress bar
                        progress = FALSE)

# 6 levels of g
pwcurve_6g = powerCurve(fm_extended_g, fixed('x'), along = 'g', breaks = 6, 
                        nsim = 50, seed = 123, 
                        # No progress bar
                        progress = FALSE)

# 8 levels of g
pwcurve_8g = powerCurve(fm_extended_g, fixed('x'), along = 'g', breaks = 8, 
                        nsim = 50, seed = 123, 
                        # No progress bar
                        progress = FALSE)

# 10 levels of g
pwcurve_10g = powerCurve(fm_extended_g, fixed('x'), along = 'g', breaks = 10, 
                        nsim = 50, seed = 123, 
                        # No progress bar
                        progress = FALSE)

# 12 levels of g
pwcurve_12g = powerCurve(fm_extended_g, fixed('x'), along = 'g', breaks = 12, 
                        nsim = 50, seed = 123, 
                        # No progress bar
                        progress = FALSE)


# Create a destination object using any of the power curves above.
all_pwcurve = pwcurve_4g

# Combine results
all_pwcurve$ps = c(pwcurve_4g$ps[1], pwcurve_6g$ps[1], pwcurve_8g$ps[1], 
                   pwcurve_10g$ps[1], pwcurve_12g$ps[1])

# Combine the different numbers of levels.
all_pwcurve$xval = c(pwcurve_4g$nlevels, pwcurve_6g$nlevels, pwcurve_8g$nlevels, 
                     pwcurve_10g$nlevels, pwcurve_12g$nlevels)


print(all_pwcurve)

plot(all_pwcurve, xlab = 'Levels of g')


References

Brysbaert, M., & Stevens, M. (2018). Power analysis and effect size in mixed effects models: A tutorial. Journal of Cognition, 1(1), 9. http://doi.org/10.5334/joc.10

Green, P., & MacLeod, C. J. (2016). SIMR: An R package for power analysis of generalized linear mixed models by simulation. Methods in Ecology and Evolution 7(4), 493–498, https://doi.org/10.1111/2041-210X.12504

Kumle, L., Vo, M. L. H., & Draschkow, D. (2021). Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R. Behavior Research Methods, 1–16. https://doi.org/10.3758/s13428-021-01546-0

Luke, S. G. (2017). Evaluating significance in linear mixed-effects models in R. Behavior Research Methods, 49(4), 1494–1502. https://doi.org/10.3758/s13428-016-0809-y



  1. The number of simulations set by nsim should be larger (Brysbaert & Stevens, 2018; Green & MacLeod, 2016). In addition, the effect size for x should be adjusted to the value that best fits with the planned study (Kumle et al., 2021).↩︎

To leave a comment for the author, please follow the link and comment on their blog: R on Pablo Bernabeu.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)