Power and sample size calculator for mitochondrial DNA association studies (Shiny)

[This article was first published on Tales of R » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The functions detailed inside the piece of code below (in a Gist) has been useful for me when I had to calculate many possible scenarios of statistical power and sample size. The formulae were taken from the article of Samuels et al., AJHG 2006, and the script showed even useful for making a variety of comparative plots.

This is intended for estimating power/ sample size in association studies, involving mitochondrial DNA haplogroups (which are categories whose frequencies depend on each other), on a Chi-square test basis. The problem with scripts is that sometimes they aren’t as friendly to many people as GUIs are. To solve this, there are many solutions but, as I don’t have programming background (apart from R), the most straightforward for me was Shiny.

Shiny is a friendly interface which allows for great interactive features (see its Tutorial), and it loads onto the web browser from an open R console, just by clicking:

library(shiny)
shiny:: runGist('5900303')

Where 5900303 is the ID of the Gist. Here is the source:

#under GNU General Public License
#server.R
library(shiny)
library(ggplot2)
shinyServer(function(input, output) {
## My function:
f <- function(n.cases, p0, OR.cas.ctrl, Nh, sig.level) {
ncases <- n.cases
p0 <- p0
Nh <- Nh
OR.cas.ctrl <- OR.cas.ctrl
sig.level <- sig.level
# Parameters related to sig.level, from [Table 2] of Samuels et al.
# For 90% power and alpha = .05, Nscaled = 8.5
if (sig.level == 0.05){
A <- -28 # Parameter A for alpha=.05
x0 <- 2.6 # Parameter x0 for alpha=.05
d <- 2.4 # Parameter d for alpha=.05
}
if (sig.level == 0.01){
A <- -13 # Parameter A for alpha=.01
x0 <- 5 # Parameter x0 for alpha=.01
d <- 2.5 # Parameter d for alpha=.01
}
if (sig.level == 0.001){
A <- -7 # Parameter A for alpha=.001
x0 <- 7.4 # Parameter x0 for alpha=.001
d <- 2.8 # Parameter d for alpha=.001
}
out.pow <- NULL # initialize vector
for(n.cases in n.cases){
OR.ctrl.cas <- 1 / OR.cas.ctrl # 1. CALCULATE P1 FROM A PREDEFINED P0, AND A DESIRED OR
OR <- OR.ctrl.cas
bracket.pw <- p0 / (OR - OR*p0) # obtained after isolating p1 in OR equation [3].
p1 <- bracket.pw / (1 + bracket.pw)
Nh037 <- Nh^0.37 # 2. CALCULATE NSCALED
num.n <- ncases*((p1-p0)^2)
den.n <- (p1*(1-p1) + p0*(1-p0))*Nh037
Nscaled <- num.n/den.n
num.power <- A - 100 # 3. CALCULATE POWER
den.power <- 1 + exp((Nscaled - x0)/d)
power <- 100 + (num.power/den.power) # The power I have to detect a given OR with my data, at a given alpha
}
OR <- OR.cas.ctrl
out.pow <- data.frame(ncases, Nh, Nscaled, p0, OR, sig.level, power)
out.pow
}
# Show the final calculated value
output$powTable <- renderTable(
{mydata <- f(input$n.cases,
input$p0,
input$OR.cas.ctrl,
input$Nh,
input$sig.level)
mydata}
)
})
view raw server.R hosted with ❤ by GitHub
#under GNU General Public License
#ui.R
library(shiny)
library(ggplot2)
# Define UI for slider demo application
shinyUI(pageWithSidebar(
# Application title
headerPanel("mtDNA power estimation"),
# Sidebar with sliders that demonstrate various available options
sidebarPanel(
# Simple integer interval
sliderInput("n.cases", "number of cases (ncases):",
min=50, max=1000, value = c(200,400)),
sliderInput("p0", "Frequency of the haplogroup in controls (p0):",
min=0, max=1, value=0.5),
sliderInput("OR.cas.ctrl", "Odds ratio to detect (OR):",
min=1.25, max=4, value=2.3, step=0.05),
sliderInput("Nh", "Number of haplogroup categories (Nh):",
min=5, max=16, value=11),
numericInput("sig.level", "Signification level (only 0.05, 0.01, 0.001):", 0.05),
helpText("This calculator gives power and sample size estimates. It is intended for
genetic association studies on mitochondrial DNA haplogroups, involving Chi-square tests.
The number of controls must be equal to ncases."),
helpText("Made from formulae in Samuels et al., AJHG 2006, PMCID: PMC1424681.
Shiny App written by Mareviv.")
),
# Show a table summarizing the values entered
mainPanel(
uiOutput("powTable"))
)
)
# To execute this script from inside R, you need to have the ui.R and server.R files into the session's working directory. Then, type:
# runApp()
# To execute directly this Gist, from the Internet to your browser, type:
# shiny:: runGist('5900303')
view raw ui.R hosted with ❤ by GitHub

To work with files inside your computer, just run R from the same directory of the files ui.R and server.R, and execute the Gist with the command:

runApp()

If this doesn’t work, you can paste the complete path to the ui and server files:

runApp("path/to/directory")

BONUS / EDIT:

New Gist, which displays a simple graph using two power/ncases values (it was hard for me to come up to this, mostly thanks to Stackoverflow and to MadScone):

library(shiny)
shiny:: runGist('5895082')
#under GNU General Public License
#server.R
library(shiny)
library(ggplot2)
shinyServer(function(input, output) {
## My function:
f <- function(min.cases, max.cases, p0, OR.cas.ctrl, Nh, sig.level) {
if (min.cases <= max.cases){
num.cases <- seq(min.cases, max.cases, by=50)}
if (min.cases > max.cases){
num.cases <- seq(max.cases, min.cases, by=50)}
ncases <- num.cases
p0 <- p0
Nh <- Nh
OR.cas.ctrl <- OR.cas.ctrl
sig.level <- sig.level
# Parameters related to sig.level, from [Table 2] of Samuels et al.
# For 90% power and alpha = .05, Nscaled = 8.5
if (sig.level == 0.05){
A <- -28 # Parameter A for alpha=.05
x0 <- 2.6 # Parameter x0 for alpha=.05
d <- 2.4 # Parameter d for alpha=.05
}
if (sig.level == 0.01){
A <- -13 # Parameter A for alpha=.01
x0 <- 5 # Parameter x0 for alpha=.01
d <- 2.5 # Parameter d for alpha=.01
}
if (sig.level == 0.001){
A <- -7 # Parameter A for alpha=.001
x0 <- 7.4 # Parameter x0 for alpha=.001
d <- 2.8 # Parameter d for alpha=.001
}
out.pow <- NULL # initialize vector
for(ncases in ncases){
OR.ctrl.cas <- 1 / OR.cas.ctrl # 1. CALCULATE P1 FROM A PREDEFINED P0, AND A DESIRED OR
OR <- OR.ctrl.cas
bracket.pw <- p0 / (OR - OR*p0) # obtained after isolating p1 in OR equation [3].
p1 <- bracket.pw / (1 + bracket.pw)
Nh037 <- Nh^0.37 # 2. CALCULATE NSCALED
num.n <- num.cases*((p1-p0)^2)
den.n <- (p1*(1-p1) + p0*(1-p0))*Nh037
Nscaled <- num.n/den.n
num.power <- A - 100 # 3. CALCULATE POWER
den.power <- 1 + exp((Nscaled - x0)/d)
power <- 100 + (num.power/den.power) # The power I have to detect a given OR with my data, at a given alpha
}
OR <- OR.cas.ctrl
sig.level <- as.numeric(sig.level)
power <- round(power, 2)
Nscaled <- round(Nscaled, 3)
out.pow <- data.frame(num.cases, Nh, Nscaled, p0, OR, sig.level, power)
out.pow
}
mydata <- reactive(f(input$min.cases,
input$max.cases,
input$p0,
input$OR.cas.ctrl,
input$Nh,
input$sig.level))
# Show the final calculated value
output$powTable <- renderDataTable(
{mydata <- f(input$min.cases,
input$max.cases,
input$p0,
input$OR.cas.ctrl,
input$Nh,
input$sig.level)
mydata},
options = list(paging = FALSE, searching = FALSE)
)
# Show the Download handlers:
output$downloadData <- downloadHandler(
filename = function() { paste(input$mydata, 'mtDNA_power_table.csv', sep='') },
content = function(file) {
write.csv(mydata(), file, na="")
}
)
output$downloadPlot <- downloadHandler(
filename = function() { paste(input$mydata, 'mtDNA_power_plot.pdf', sep='') },
content = function(file) {
pdf(file)
df <- mydata()
p <- print(ggplot(data = mydata(), aes(num.cases, power)) +
theme_bw() +
theme(text=element_text(size=12)) +
labs(title = paste("Power estimates; ", "p0 =", " ", df[1,4], ", ", "Nh =", " ", df[1,2], ", ", "p =", " ", df[1,6], sep = "")) +
scale_color_brewer(palette = "Dark2", guide = guide_legend(reverse=TRUE)) +
xlab(paste("Number of Cases, Controls")) +
ylab("Power (%)") +
xlim(50, 1000) +
ylim(0, 100) +
# scale_x_log10() +
geom_line(alpha=0.8, size=0.2) +
geom_point(aes(shape = factor(OR)), colour="black"))
dev.off()
}
)
# Show the plot with a statistical power curve
output$powHap <- renderPlot(
{
df <- mydata()
p <- print(ggplot(data = mydata(), aes(num.cases, power)) +
theme_bw() +
theme(text=element_text(size=12)) +
labs(title = paste("Power estimates; ", "p0 =", " ", df[1,4], ", ", "Nh =", " ", df[1,2], ", ", "p =", " ", df[1,6], sep = "")) +
scale_color_brewer(palette = "Dark2", guide = guide_legend(reverse=TRUE)) +
xlab(paste("Number of Cases, Controls")) +
ylab("Power (%)") +
xlim(50, 1000) +
ylim(0, 100) +
# scale_x_log10() +
geom_line(alpha=0.8, size=0.2) +
geom_point(aes(shape = factor(OR)), colour="black"))
})
})
view raw server.R hosted with ❤ by GitHub
#under GNU General Public License
#ui.R
library(shiny)
library(ggplot2)
# Define UI for slider demo application
shinyUI(pageWithSidebar(
# Application title
headerPanel("mtDNA power estimation"),
# Sidebar with sliders that demonstrate various available options
sidebarPanel(
# Simple integer interval
sliderInput("min.cases", "Minimum number of cases (num.cases):",
min=50, max=1000, value = 100, step=50),
sliderInput("max.cases", "Maximum number of cases (num.cases):",
min=50, max=1000, value = 600, step=50),
sliderInput("p0", "Frequency of the haplogroup in controls (p0):",
min=0, max=1, value=0.5),
sliderInput("OR.cas.ctrl", "Odds ratio to detect (OR):",
min=1.05, max=4, value=2.3, step=0.05),
sliderInput("Nh", "Number of haplogroup categories (Nh):",
min=4, max=20, value=11),
selectInput("sig.level", "Significance level:",
choices = c("0.05", "0.01", "0.001")),
HTML("<h4>Instructions:</h4>"),
helpText("Power and sample size estimates for genetic association
studies on mitochondrial DNA haplogroups.
Formulae in Samuels et al., AJHG 2006 ",
a("PMC1424681", href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1424681/", target="_blank"), "."),
helpText("Can be used when multiple Chi-square tests are involved
(one case-control comparison for each haplogroup category).
Cases correspond to the same number of controls."),
helpText("Written in R/Shiny by A. Baluja."),
HTML("<h5>&nbsp;</h5>"),
HTML("<h4>R package via GitHub:</h4>"),
HTML("<pre>devtools::install_github('aurora-mareviv/mthapower')</pre>"),
HTML("<h4> </h4>"),
HTML("<h4>Shiny app (local) via Gist:</h4>"),
HTML("<pre>shiny::runGist('5895082')</pre>"),
downloadButton('downloadData', 'Download dataset'),
downloadButton('downloadPlot', 'Download plot')
),
# Show a table summarizing the values entered
mainPanel(
plotOutput("powHap"),
div(dataTableOutput("powTable"), style = "font-size:90%")
)
)
)
# To execute this script from inside R, you need to have the ui.R and server.R files into the session's working directory. Then, type:
# runApp()
# To execute directly this Gist, from the Internet to your browser, type:
# shiny::runGist('5895082')
view raw ui.R hosted with ❤ by GitHub
Structure of the human mitochondrial genome.

Structure of the human mitochondrial genome. (Photo credit: Wikipedia)


To leave a comment for the author, please follow the link and comment on their blog: Tales of R » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)