Power and sample size calculator for mitochondrial DNA association studies (Shiny)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The functions detailed inside the piece of code below (in a Gist) has been useful for me when I had to calculate many possible scenarios of statistical power and sample size. The formulae were taken from the article of Samuels et al., AJHG 2006, and the script showed even useful for making a variety of comparative plots.
This is intended for estimating power/ sample size in association studies, involving mitochondrial DNA haplogroups (which are categories whose frequencies depend on each other), on a Chi-square test basis. The problem with scripts is that sometimes they aren’t as friendly to many people as GUIs are. To solve this, there are many solutions but, as I don’t have programming background (apart from R), the most straightforward for me was Shiny.
Shiny is a friendly interface which allows for great interactive features (see its Tutorial), and it loads onto the web browser from an open R console, just by clicking:
library(shiny) shiny:: runGist('5900303')
Where 5900303 is the ID of the Gist. Here is the source:
#under GNU General Public License | |
#server.R | |
library(shiny) | |
library(ggplot2) | |
shinyServer(function(input, output) { | |
## My function: | |
f <- function(n.cases, p0, OR.cas.ctrl, Nh, sig.level) { | |
ncases <- n.cases | |
p0 <- p0 | |
Nh <- Nh | |
OR.cas.ctrl <- OR.cas.ctrl | |
sig.level <- sig.level | |
# Parameters related to sig.level, from [Table 2] of Samuels et al. | |
# For 90% power and alpha = .05, Nscaled = 8.5 | |
if (sig.level == 0.05){ | |
A <- -28 # Parameter A for alpha=.05 | |
x0 <- 2.6 # Parameter x0 for alpha=.05 | |
d <- 2.4 # Parameter d for alpha=.05 | |
} | |
if (sig.level == 0.01){ | |
A <- -13 # Parameter A for alpha=.01 | |
x0 <- 5 # Parameter x0 for alpha=.01 | |
d <- 2.5 # Parameter d for alpha=.01 | |
} | |
if (sig.level == 0.001){ | |
A <- -7 # Parameter A for alpha=.001 | |
x0 <- 7.4 # Parameter x0 for alpha=.001 | |
d <- 2.8 # Parameter d for alpha=.001 | |
} | |
out.pow <- NULL # initialize vector | |
for(n.cases in n.cases){ | |
OR.ctrl.cas <- 1 / OR.cas.ctrl # 1. CALCULATE P1 FROM A PREDEFINED P0, AND A DESIRED OR | |
OR <- OR.ctrl.cas | |
bracket.pw <- p0 / (OR - OR*p0) # obtained after isolating p1 in OR equation [3]. | |
p1 <- bracket.pw / (1 + bracket.pw) | |
Nh037 <- Nh^0.37 # 2. CALCULATE NSCALED | |
num.n <- ncases*((p1-p0)^2) | |
den.n <- (p1*(1-p1) + p0*(1-p0))*Nh037 | |
Nscaled <- num.n/den.n | |
num.power <- A - 100 # 3. CALCULATE POWER | |
den.power <- 1 + exp((Nscaled - x0)/d) | |
power <- 100 + (num.power/den.power) # The power I have to detect a given OR with my data, at a given alpha | |
} | |
OR <- OR.cas.ctrl | |
out.pow <- data.frame(ncases, Nh, Nscaled, p0, OR, sig.level, power) | |
out.pow | |
} | |
# Show the final calculated value | |
output$powTable <- renderTable( | |
{mydata <- f(input$n.cases, | |
input$p0, | |
input$OR.cas.ctrl, | |
input$Nh, | |
input$sig.level) | |
mydata} | |
) | |
}) |
#under GNU General Public License | |
#ui.R | |
library(shiny) | |
library(ggplot2) | |
# Define UI for slider demo application | |
shinyUI(pageWithSidebar( | |
# Application title | |
headerPanel("mtDNA power estimation"), | |
# Sidebar with sliders that demonstrate various available options | |
sidebarPanel( | |
# Simple integer interval | |
sliderInput("n.cases", "number of cases (ncases):", | |
min=50, max=1000, value = c(200,400)), | |
sliderInput("p0", "Frequency of the haplogroup in controls (p0):", | |
min=0, max=1, value=0.5), | |
sliderInput("OR.cas.ctrl", "Odds ratio to detect (OR):", | |
min=1.25, max=4, value=2.3, step=0.05), | |
sliderInput("Nh", "Number of haplogroup categories (Nh):", | |
min=5, max=16, value=11), | |
numericInput("sig.level", "Signification level (only 0.05, 0.01, 0.001):", 0.05), | |
helpText("This calculator gives power and sample size estimates. It is intended for | |
genetic association studies on mitochondrial DNA haplogroups, involving Chi-square tests. | |
The number of controls must be equal to ncases."), | |
helpText("Made from formulae in Samuels et al., AJHG 2006, PMCID: PMC1424681. | |
Shiny App written by Mareviv.") | |
), | |
# Show a table summarizing the values entered | |
mainPanel( | |
uiOutput("powTable")) | |
) | |
) | |
# To execute this script from inside R, you need to have the ui.R and server.R files into the session's working directory. Then, type: | |
# runApp() | |
# To execute directly this Gist, from the Internet to your browser, type: | |
# shiny:: runGist('5900303') |
To work with files inside your computer, just run R from the same directory of the files ui.R and server.R, and execute the Gist with the command:
runApp()
If this doesn’t work, you can paste the complete path to the ui and server files:
runApp("path/to/directory")
BONUS / EDIT:
New Gist, which displays a simple graph using two power/ncases values (it was hard for me to come up to this, mostly thanks to Stackoverflow and to MadScone):
library(shiny) shiny:: runGist('5895082')
#under GNU General Public License | |
#server.R | |
library(shiny) | |
library(ggplot2) | |
shinyServer(function(input, output) { | |
## My function: | |
f <- function(min.cases, max.cases, p0, OR.cas.ctrl, Nh, sig.level) { | |
if (min.cases <= max.cases){ | |
num.cases <- seq(min.cases, max.cases, by=50)} | |
if (min.cases > max.cases){ | |
num.cases <- seq(max.cases, min.cases, by=50)} | |
ncases <- num.cases | |
p0 <- p0 | |
Nh <- Nh | |
OR.cas.ctrl <- OR.cas.ctrl | |
sig.level <- sig.level | |
# Parameters related to sig.level, from [Table 2] of Samuels et al. | |
# For 90% power and alpha = .05, Nscaled = 8.5 | |
if (sig.level == 0.05){ | |
A <- -28 # Parameter A for alpha=.05 | |
x0 <- 2.6 # Parameter x0 for alpha=.05 | |
d <- 2.4 # Parameter d for alpha=.05 | |
} | |
if (sig.level == 0.01){ | |
A <- -13 # Parameter A for alpha=.01 | |
x0 <- 5 # Parameter x0 for alpha=.01 | |
d <- 2.5 # Parameter d for alpha=.01 | |
} | |
if (sig.level == 0.001){ | |
A <- -7 # Parameter A for alpha=.001 | |
x0 <- 7.4 # Parameter x0 for alpha=.001 | |
d <- 2.8 # Parameter d for alpha=.001 | |
} | |
out.pow <- NULL # initialize vector | |
for(ncases in ncases){ | |
OR.ctrl.cas <- 1 / OR.cas.ctrl # 1. CALCULATE P1 FROM A PREDEFINED P0, AND A DESIRED OR | |
OR <- OR.ctrl.cas | |
bracket.pw <- p0 / (OR - OR*p0) # obtained after isolating p1 in OR equation [3]. | |
p1 <- bracket.pw / (1 + bracket.pw) | |
Nh037 <- Nh^0.37 # 2. CALCULATE NSCALED | |
num.n <- num.cases*((p1-p0)^2) | |
den.n <- (p1*(1-p1) + p0*(1-p0))*Nh037 | |
Nscaled <- num.n/den.n | |
num.power <- A - 100 # 3. CALCULATE POWER | |
den.power <- 1 + exp((Nscaled - x0)/d) | |
power <- 100 + (num.power/den.power) # The power I have to detect a given OR with my data, at a given alpha | |
} | |
OR <- OR.cas.ctrl | |
sig.level <- as.numeric(sig.level) | |
power <- round(power, 2) | |
Nscaled <- round(Nscaled, 3) | |
out.pow <- data.frame(num.cases, Nh, Nscaled, p0, OR, sig.level, power) | |
out.pow | |
} | |
mydata <- reactive(f(input$min.cases, | |
input$max.cases, | |
input$p0, | |
input$OR.cas.ctrl, | |
input$Nh, | |
input$sig.level)) | |
# Show the final calculated value | |
output$powTable <- renderDataTable( | |
{mydata <- f(input$min.cases, | |
input$max.cases, | |
input$p0, | |
input$OR.cas.ctrl, | |
input$Nh, | |
input$sig.level) | |
mydata}, | |
options = list(paging = FALSE, searching = FALSE) | |
) | |
# Show the Download handlers: | |
output$downloadData <- downloadHandler( | |
filename = function() { paste(input$mydata, 'mtDNA_power_table.csv', sep='') }, | |
content = function(file) { | |
write.csv(mydata(), file, na="") | |
} | |
) | |
output$downloadPlot <- downloadHandler( | |
filename = function() { paste(input$mydata, 'mtDNA_power_plot.pdf', sep='') }, | |
content = function(file) { | |
pdf(file) | |
df <- mydata() | |
p <- print(ggplot(data = mydata(), aes(num.cases, power)) + | |
theme_bw() + | |
theme(text=element_text(size=12)) + | |
labs(title = paste("Power estimates; ", "p0 =", " ", df[1,4], ", ", "Nh =", " ", df[1,2], ", ", "p =", " ", df[1,6], sep = "")) + | |
scale_color_brewer(palette = "Dark2", guide = guide_legend(reverse=TRUE)) + | |
xlab(paste("Number of Cases, Controls")) + | |
ylab("Power (%)") + | |
xlim(50, 1000) + | |
ylim(0, 100) + | |
# scale_x_log10() + | |
geom_line(alpha=0.8, size=0.2) + | |
geom_point(aes(shape = factor(OR)), colour="black")) | |
dev.off() | |
} | |
) | |
# Show the plot with a statistical power curve | |
output$powHap <- renderPlot( | |
{ | |
df <- mydata() | |
p <- print(ggplot(data = mydata(), aes(num.cases, power)) + | |
theme_bw() + | |
theme(text=element_text(size=12)) + | |
labs(title = paste("Power estimates; ", "p0 =", " ", df[1,4], ", ", "Nh =", " ", df[1,2], ", ", "p =", " ", df[1,6], sep = "")) + | |
scale_color_brewer(palette = "Dark2", guide = guide_legend(reverse=TRUE)) + | |
xlab(paste("Number of Cases, Controls")) + | |
ylab("Power (%)") + | |
xlim(50, 1000) + | |
ylim(0, 100) + | |
# scale_x_log10() + | |
geom_line(alpha=0.8, size=0.2) + | |
geom_point(aes(shape = factor(OR)), colour="black")) | |
}) | |
}) |
#under GNU General Public License | |
#ui.R | |
library(shiny) | |
library(ggplot2) | |
# Define UI for slider demo application | |
shinyUI(pageWithSidebar( | |
# Application title | |
headerPanel("mtDNA power estimation"), | |
# Sidebar with sliders that demonstrate various available options | |
sidebarPanel( | |
# Simple integer interval | |
sliderInput("min.cases", "Minimum number of cases (num.cases):", | |
min=50, max=1000, value = 100, step=50), | |
sliderInput("max.cases", "Maximum number of cases (num.cases):", | |
min=50, max=1000, value = 600, step=50), | |
sliderInput("p0", "Frequency of the haplogroup in controls (p0):", | |
min=0, max=1, value=0.5), | |
sliderInput("OR.cas.ctrl", "Odds ratio to detect (OR):", | |
min=1.05, max=4, value=2.3, step=0.05), | |
sliderInput("Nh", "Number of haplogroup categories (Nh):", | |
min=4, max=20, value=11), | |
selectInput("sig.level", "Significance level:", | |
choices = c("0.05", "0.01", "0.001")), | |
HTML("<h4>Instructions:</h4>"), | |
helpText("Power and sample size estimates for genetic association | |
studies on mitochondrial DNA haplogroups. | |
Formulae in Samuels et al., AJHG 2006 ", | |
a("PMC1424681", href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1424681/", target="_blank"), "."), | |
helpText("Can be used when multiple Chi-square tests are involved | |
(one case-control comparison for each haplogroup category). | |
Cases correspond to the same number of controls."), | |
helpText("Written in R/Shiny by A. Baluja."), | |
HTML("<h5> </h5>"), | |
HTML("<h4>R package via GitHub:</h4>"), | |
HTML("<pre>devtools::install_github('aurora-mareviv/mthapower')</pre>"), | |
HTML("<h4> </h4>"), | |
HTML("<h4>Shiny app (local) via Gist:</h4>"), | |
HTML("<pre>shiny::runGist('5895082')</pre>"), | |
downloadButton('downloadData', 'Download dataset'), | |
downloadButton('downloadPlot', 'Download plot') | |
), | |
# Show a table summarizing the values entered | |
mainPanel( | |
plotOutput("powHap"), | |
div(dataTableOutput("powTable"), style = "font-size:90%") | |
) | |
) | |
) | |
# To execute this script from inside R, you need to have the ui.R and server.R files into the session's working directory. Then, type: | |
# runApp() | |
# To execute directly this Gist, from the Internet to your browser, type: | |
# shiny::runGist('5895082') |

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.