Site icon R-bloggers

Tracking Stan sampling progress in Shiny

[This article was first published on R-bloggers | A Random Walk, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
  • Introduction

    The previous post demonstrates the use of pre-compiled Stan models in interactive R Shiny applications to avoid unnecessary Stan model (re-)compilation on application start-up. In this short follow-up post we go a step further and tackle the issue of tracking the Stan model sampling progress itself in a shiny-application. The idea, mainly inspired by this blog post, is to execute the Stan model sampling in a separate background R process using the convenient callr-package to avoid blocking of the shiny-application.

    Demo Shiny-app

    For illustration purposes, we start from the basic shiny-app in the previous post that fits a simple linear model in Stan and plots the sampled posteriors:

    library(shiny)
    ui <- fluidPage(
    sidebarLayout(
    sidebarPanel(
    numericInput("N", label = "N", value = 10)
    ),
    mainPanel(
    plotOutput("posteriors")
    )
    )
    )
    server <- function(input, output, session) {
    ## draw samples
    draws <- reactive({
    N <- input$N
    rstan::sampling(
    object = shinyStanModels:::stanmodels[["lm"]],
    data = list(N = N, x = seq_len(N), y = rnorm(N, seq_len(N), 0.1)),
    chains = 2,
    iter = 1000
    )
    })
    ## plot histograms
    output$posteriors <- renderPlot({
    req(draws())
    op <- par(mfrow = c(1, 2), cex = 1.25)
    hist(rstan::extract(draws(), "alpha")[[1]], main = bquote("Posterior samples"~alpha), xlab = expression(alpha))
    hist(rstan::extract(draws(), "beta")[[1]], main = bquote("Posterior samples"~beta), xlab = expression(beta))
    par(op)
    })
    }
    shinyApp(ui = ui, server = server)

    Note: here the pre-compiled "lm" Stan model is called from a dummy R-package shinyStanModels as explained previously to avoid model recompilation every time the shiny-app is launched.

    Shiny-app blocking

    By increasing the number observations \(N\), we also increase the execution time of the MCMC sampling process. Setting e.g. \(N = 500\), it already takes quite a while for the computations to complete1. Unfortunately, there is no way to tell from the shiny-app whether the sampling iterations are progressing quickly or still require a long time to finish. In an interactive R-session, the progress of the sampling iterations across individual chains is either printed to the console (that is, the current stdout connection) or to a temporary file when multiple chains are being sampled in parallel. It would be great if we could divert these progress messages to the shiny-app itself, so that the sampling iterations can be tracked in real-time in the same way as in any interactive R-session.

    A first idea that comes to mind is to make use of shiny’s reactivePoll() or reactiveFileReader() to periodically read the contents of the temporary progress file generated by Stan, and display these contents in reactively in the shiny-app. The problem, however, is that due to shiny’s synchronous reactive nature the shiny-app is effectively blocked while executing the sampling process, and the reactiveFileReader() will only have a chance to read the progress file when the model sampling is already finished, thereby defeating its purpose.

    Instead, we have to somehow detach the slow model sampling process from the main shiny-app process so that both processes do not interfere with each other and the shiny-app is free to execute other requests. In fact, what is needed is a way to launch the sampling process in a separate background R-process, and, as it turns out, this is exactly what is provided by the convenient callr-package.

    Unblocking a shiny-app with callr

    In order to execute the Stan model sampling procedure in a separate R-process, we wrap the call to rstan::sampling() in a function and pass it to the func argument of callr::r_bg(). The function is evaluated in a background R-process and may take a long time to complete, but callr::r_bg() immediately returns control to the current R-process so that it is free to perform other operations:

    library(callr)
    ## launch job in separate R process
    r_process <- r_bg(
    func = function(N) {
    rstan::sampling(
    object = shinyStanModels:::stanmodels[["lm"]],
    data = list(N = N, x = seq_len(N), y = rnorm(N, seq_len(N), 0.1)),
    open_progress = FALSE,
    chains = 2,
    iter = 1000
    )
    },
    args = list(N = 500)
    )
    ## the process object is returned immediately
    ## and can be polled from the current R process
    r_process
    #> PROCESS 'R', running, pid 27966.
    ## check if job is still running
    r_process$is_alive()
    #> [1] TRUE
    ## result is not yet available
    tryCatch(r_process$get_result(), error = function(e) e$message)
    #> [1] "Still alive"
    ## extract the result when the job is finished
    system.time({
    r_process$wait()
    })
    #> user system elapsed
    #> 25.697 0.102 25.716
    r_process
    #> PROCESS 'R', finished.
    r_process$get_result()
    #> Inference for Stan model: lm.
    #> 2 chains, each with iter=1000; warmup=500; thin=1;
    #> post-warmup draws per chain=500, total post-warmup draws=1000.
    #>
    #> mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
    #> alpha 0.00 0.00 0.01 -0.02 -0.01 0.00 0.01 0.02 382 1.00
    #> beta 1.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 473 1.00
    #> sigma 0.10 0.00 0.00 0.10 0.10 0.10 0.11 0.11 345 1.01
    #> lp__ 876.27 0.07 1.18 873.03 875.81 876.53 877.09 877.60 310 1.00
    #>
    #> Samples were drawn using NUTS(diag_e) at Sun Feb 28 13:53:14 2021.
    #> For each parameter, n_eff is a crude measure of effective sample size,
    #> and Rhat is the potential scale reduction factor on split chains (at
    #> convergence, Rhat=1).

    Note: launching a separate R-process does not necessarily require a multi-core processor and also works using only a single-core processor. On the other hand, the R-session does need to have sufficient permission/rights to launch a separate R-process on the system.

    Demo Shiny-app (continued)

    In a first step, we update the shiny-app by wrapping the rstan::sampling() call in a call to callr::r_bg() so that the execution of the model sampling no longer blocks the shiny-app process. Using the stdout argument of callr::r_bg() we can divert2 all messages generated by rstan::sampling() (and printed to stdout) to a temporary file tfile.

    Whenever the number of observations \(N\) is changed and a new background R-process is launched, we instruct the shiny-app to start polling the process object (r$bg_process) by updating the reactive value r$poll. This will trigger an observer to: (1) read and assign the contents of the progress file tfile to a reactive value r$progress whenever the progress file has been modified; (2) check if the background process has finished executing and if this is the case retrieve the sampling results. The observer will be re-executed every second and only stops re-executing when the background process is no longer alive and r$poll has been turned off (i.e. set to FALSE).

    The progress messages can now be rendered in real-time in the shiny-app by simply pasting the contents of the reactive value r$progress. The posterior histograms only need to be updated whenever new sampling results are available in the reactive value r$draws.

    In short, we can summarize the updated shiny-app with the following graph:

    More importantly, the updated shiny-app –including model sampling progress– now looks as follows:

    library(shiny)
    ui <- fluidPage(
    sidebarLayout(
    sidebarPanel(
    numericInput("N", label = "N", value = 10)
    ),
    mainPanel(
    plotOutput("posteriors"),
    hr(),
    ## limit the height of the progress message box
    tags$head(tags$style("#progress{overflow-y:scroll; max-height: 500px;}")),
    verbatimTextOutput("progress")
    )
    )
    )
    server <- function(input, output, session) {
    ## file to write progress
    tfile <- tempfile(fileext = ".txt")
    ## reactive values
    r <- reactiveValues(
    progress_mtime = -1
    )
    observeEvent(input$N, {
    ## start sampling in background process
    r$bg_process <- callr::r_bg(
    ## this is a long running computation
    func = function(N) {
    rstan::sampling(
    object = shinyStanModels:::stanmodels[["lm"]],
    data = list(N = N, x = seq_len(N), y = rnorm(N, seq_len(N), 0.1)),
    open_progress = FALSE,
    iter = 1000,
    chains = 2
    )
    },
    args = list(N = input$N),
    stdout = tfile,
    supervise = TRUE
    )
    ## start polling bg process
    r$poll <- TRUE
    })
    ## observe status of bg process
    observe({
    req(r$bg_process, r$poll)
    ## keep re-executing observer as
    ## long as bg process is running
    invalidateLater(millis = 1000, session)
    ## read current progress if file is modified
    mtime <- file.info(tfile)$mtime
    if(mtime > r$progress_mtime) {
    r$progress <- readLines(tfile)
    r$progress_mtime <- mtime
    }
    ## extract draws when bg process is finished
    if(!r$bg_process$is_alive()) {
    r$draws <- r$bg_process$get_result()
    r$poll <- FALSE ## stop polling bg process
    }
    })
    ## plot histograms
    output$posteriors <- renderPlot({
    req(r$draws)
    op <- par(mfrow = c(1, 2), cex = 1.25)
    hist(rstan::extract(r$draws, "alpha")[[1]], main = bquote("Posterior samples"~alpha), xlab = expression(alpha))
    hist(rstan::extract(r$draws, "beta")[[1]], main = bquote("Posterior samples"~beta), xlab = expression(beta))
    par(op)
    })
    ## print progress
    output$progress <- renderText({
    req(r$progress)
    paste(r$progress, collapse = "\n")
    })
    }
    shinyApp(ui = ui, server = server)

    Figure 1: Progress of sampling iterations with two chains (in parallel) on a multi-core processor

    Figure 2: Progress of sampling iterations with two chains (in sequence) on a single-core processor3

    Session Info

    #> R version 4.0.2 (2020-06-22)
    #> Platform: x86_64-pc-linux-gnu (64-bit)
    #> Running under: Ubuntu 18.04.5 LTS
    #>
    #> Matrix products: default
    #> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
    #> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
    #>
    #> locale:
    #> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
    #> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
    #> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
    #> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
    #> [9] LC_ADDRESS=C LC_TELEPHONE=C
    #> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
    #>
    #> attached base packages:
    #> [1] stats graphics grDevices utils datasets methods base
    #>
    #> other attached packages:
    #> [1] callr_3.5.1
    #>
    #> loaded via a namespace (and not attached):
    #> [1] rstan_2.21.2 tidyselect_1.1.0 xfun_0.21
    #> [4] bslib_0.2.4 purrr_0.3.4 V8_3.4.0
    #> [7] colorspace_2.0-0 vctrs_0.3.6 generics_0.1.0
    #> [10] htmltools_0.5.1.1 stats4_4.0.2 loo_2.4.1
    #> [13] yaml_2.2.1 rlang_0.4.10 pkgbuild_1.2.0
    #> [16] jquerylib_0.1.3 pillar_1.4.7 DBI_1.1.1
    #> [19] glue_1.4.2 withr_2.4.1 RColorBrewer_1.1-2
    #> [22] matrixStats_0.58.0 lifecycle_0.2.0 stringr_1.4.0
    #> [25] munsell_0.5.0 blogdown_1.1.7 gtable_0.3.0
    #> [28] visNetwork_2.0.9 htmlwidgets_1.5.3 codetools_0.2-16
    #> [31] evaluate_0.14 inline_0.3.17 knitr_1.31
    #> [34] ps_1.5.0 shinyStanModels_0.1 DiagrammeR_1.0.6.1
    #> [37] parallel_4.0.2 curl_4.3 highr_0.8
    #> [40] Rcpp_1.0.6 scales_1.1.1 RcppParallel_5.0.2
    #> [43] StanHeaders_2.21.0-7 jsonlite_1.7.2 gridExtra_2.3
    #> [46] ggplot2_3.3.3 digest_0.6.27 stringi_1.5.3
    #> [49] bookdown_0.21 processx_3.4.5 dplyr_1.0.4
    #> [52] grid_4.0.2 cli_2.3.0 tools_4.0.2
    #> [55] magrittr_2.0.1 sass_0.3.1 tibble_3.0.6
    #> [58] crayon_1.4.1 pkgconfig_2.0.3 ellipsis_0.3.1
    #> [61] prettyunits_1.1.1 assertthat_0.2.1 rmarkdown_2.6.6
    #> [64] R6_2.5.0 compiler_4.0.2

    1. this Stan program could be made more efficient by specifying reasonable priors, but that is not the purpose of this illustration.↩︎

    2. this approach also works when multiple chains are sampled in parallel, i.e. if the cores argument in rstan::sampling() is larger than 1.↩︎

    3. here the shiny-app is launched in an isolated docker container with a limit on the cpu-usage (https://docs.docker.com/config/containers/resource_constraints/#cpu)↩︎

    To leave a comment for the author, please follow the link and comment on their blog: R-bloggers | A Random Walk.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.