Site icon R-bloggers

Shiny application (with modules) – Saving and Restoring from RDS

[This article was first published on Anindya Mozumdar, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I am working on a Shiny application which allows the user to upload data, do some analysis and processing on each variable in the data, and finally use the processed variables to build a statistical model. As there may be hundreds of variables in the data, the user may want to process only a few variables in one sitting and later continue the work from where they left off. They may also want to pass on their work to another user who can then continue processing the variables. The Shiny application is also divided into modules – so there are different modules for data upload and exploration, processing each variables, building the statistical model and run further analysis on the model.

One option to allow this functionality would be to store the partially processed data and variables into an RDS file, which can then be loaded onto another session allowing an user to continue the work. This post describes a simple application to implement this functionality. This framework can also be extended to larger Shiny applications.

The example application has three tabs. The first tab is the Data tab. This allows the user to upload a CSV file. Once uploaded, the names of the variables in the data along with their data types are displayed in a table below. The data is read using the read_csv function from the readr package, and the data types detected automatically. The second tab is the Distribution tab. This provides a dropdown of the numeric (class numeric or integer) variables available in the data. On choosing a variable, it is grouped into deciles and the volume for each group is computed (of course, by definition, the volume will be approximately 10% of the total for each group) and displayed in a table. The third tab allows you to save the state of the application. A download button will store the data, variable details and the computed deciles into an RDS file. The RDS file can be re-uploaded back to the application to retrieve the data and the deciles which were already computed.

There are two Shiny modules. The first module is the data_module which allows you to upload the data and then view the variables and their types. The second module is the distribution_module which provides a dropdown of the numeric variables available in the data and decile them.

The overall UI of the application is created using the code below.

ui <- shinyUI(
  navbarPage(
    "Data Distribution",
    id = "nav_top",
    tabPanel(
      "Data",
      data_ui("data_module")
    ),
    tabPanel(
      "Distribution",
      distribution_ui("distribution_module")
    ),
    tabPanel(
      "Save/Restore",
      downloadButton("save_state", "Save to file"),
      br(),
      br(),
      br(),
      fileInput("restore_state", "Restore from file",
                placeholder = ".rds file")
    )
  )
)

The data_ui and distribution_ui are the UI functions defined by the data and distribution module respectively. The third tab is the “Save/Restore” tab, which comprises a downloadButton to download the RDS and a fileInput to upload the RDS.

The data_ui and distribution_ui functions are really simple. The only thing to note is the use of the NS function to create a namespaced ID.

data_ui <- function(id) {
  
  ns <- NS(id)
  tagList(
    fileInput(ns("in_data"), "Upload data", accept = "text/csv",
              placeholder = "CSV file"),
    tableOutput(ns("out_data"))
  )
  
}

distribution_ui <- function(id) {
  
  ns <- NS(id)
  tagList(
    selectInput(ns("varlist_distribution"), "Select variable", choices = NULL),
    tableOutput(ns("binned_distribution"))
  )
  
}

The data_ui consists of a fileInput to upload the data and a tableOutput to view the variables in the data along with their types. The distribution_ui consists of a selectInput to choose the variable for which the deciles need to be computed and a tableOutput to view the results.

The most important functions in this application are the server functions. This includes the overall server function for the application as well as those of the individual modules.

The data_server function is shown below.

data_server <- function(input, output, session, restored_ds) {
  
  # Reactive uploaded file
  data_file <- reactive({
    validate(need(input$in_data, message = FALSE))
    input$in_data
  })
  
  # Reactive to store data if uploaded or restored
  data_ds <- reactive({
    if (!is.null(restored_ds())) {
      ds <- restored_ds()
    } else {
      ds <- read_csv(data_file()$datapath, guess_max = 50000)
      ds <- as.data.frame(ds)
    }
    ds
  })
  
  # Name and data types of variables for uploaded or restored data
  data_vars <- reactive({
    ds <- data_ds()
    vnames <- gsub("\\.", "_", make.names(colnames(ds), unique = TRUE))
    vtypes <- vapply(ds, class, character(1))
    data.frame(vnames = vnames, vtypes = vtypes, stringsAsFactors = FALSE)
  })
  
  # Display name and data types
  output$out_data <- renderTable({
    data_vars()
  })
  
  # Return
  list(
    data_ds = data_ds,
    data_vars = data_vars
  )
  
}

The arguments to the function include the standard Shiny boilerplate – input, output and session. In addition, we have one more argument called restored_ds. When the application runs, the global server function will call this module with restored_ds set to NULL. However, if the user restores the application state by uploading a RDS file, then the data stored in the RDS will be passed as an argument to this function. In the definition of the reactive data_ds, we first check if restored_ds is NULL; if yes, we read the CSV data uploaded by the user and if not, we return the restored dataset passed to the server function. As you might guess, the variable which is passed by the global server function is itself a reactive (a reactiveValue object in Shiny). The use of reactivity at each stage allows for the outputs to be refreshed on any change. Finally, this module returns two data frames – one containing the uploaded or restored data and the other containing the variables along with their data types.

The distribution_server is shown below.

distribution_server <- function(input, output, session, data_ds_vars,
                                restored_decile) {
  
  # Global reactive values to store deciles
  deciles_g <- reactiveValues()
  
  # Numeric variables
  numeric_vars <- reactive({
    validate(need(data_ds_vars, message = FALSE))
    data_vars <- data_ds_vars$data_vars()
    data_vars <- data_vars[data_vars$vtypes == "numeric" |
                             data_vars$vtypes == "integer", ]
    vnames <- data_vars$vnames
    vnames
  })
  
  # Update choice of variables for which deciles can be created
  observe({
    updateSelectInput(session, "varlist_distribution", "Select variable",
                      choices = numeric_vars())
  }, priority = 20)
  
  # If the list of variables change, reset everything to NULL
  observeEvent(numeric_vars(), {
    for (name in names(deciles_g)) {
      deciles_g[[name]] <<- NULL
    }
  }, priority = 10)
  
  # Calculate deciles whenever a new variable selection is made and display
  observeEvent(input$varlist_distribution, {
    validate(need(data_ds_vars, message = FALSE))
    req(input$varlist_distribution)
    if (is.null(deciles_g[[input$varlist_distribution]])) {
      if (!is.null(restored_decile()) &&
          !is.null(restored_decile()[[input$varlist_distribution]])) {
        deciles_g[[input$varlist_distribution]] <-
          restored_decile()[[input$varlist_distribution]]
      } else {
        deciles_g[[input$varlist_distribution]] <<-
          bin_numeric(data_ds_vars$data_ds(), input$varlist_distribution)
      }
    }
    
    output$binned_distribution <- renderTable({
      deciles_g[[input$varlist_distribution]]
    })
  })
  
  # Store deciles as a list
  deciles <- reactive({
    Filter(Negate(is.null), reactiveValuesToList(deciles_g))
  })
  
  # Return
  deciles
  
}

This is slightly more complex than the data_server. Consider what happens when a new CSV file is uploaded by the user. We first find the numeric variables in the data and display it in the dropdown. This is accomplished using the reactive numeric_vars which refreshes whenever the underlying data changes. We also store the deciles created for any variable in the reactiveValues object deciles_g. If the underlying data changes, we set all the computed deciles to NULL – this is accomplished using a high-priority observeEvent block.

Another observeEvent block observes any changes in the choice of the input variable. It first checks if the deciles have already been computed by checking if the corresponding entry in deciles_g is NULL. If not, it calls the function bin_numeric defined in this module to compute the deciles. Otherwise, it simply retrieves the already saved value.

In order to enable a user to restore previously computed deciles, the function accepts an additional argument called restored_decile. This is similar to the restored_ds argument we have seen in data_server above; a NULL value is passed when the application is run but the user may choose to pass previously computed deciles from the RDS file. In the observeEvent block described in the previous paragraph, we also check if restored_decile is not NULL and whether it has the deciles for the variable stored inside it; if yes, it does not recompute the deciles.

This function returns the deciles reactive, which converts the deciles_g object to a list, removes any NULL values and returns it back to the global server.

Finally, the global server function is shown below.

server <- function(input, output, session) {
  
  restored_ds <- reactiveVal()
  restored_decile <- reactiveVal()
  
  # Call modules
  data_ds_vars <- callModule(data_server, "data_module", restored_ds)
  distributions <- callModule(distribution_server, "distribution_module",
                              data_ds_vars, restored_decile)
  
  # Save state
  output$save_state <- downloadHandler(
    filename = function() {
      paste0("dd-", Sys.Date(), ".rds")
    },
    content = function(file) {
      data_ds <- isolate(data_ds_vars$data_ds())
      data_vars <- isolate(data_ds_vars$data_vars())
      deciles <- isolate(distributions())
      save_list <- list(
        data_ds = data_ds,
        data_vars = data_vars,
        deciles = deciles
      )
      saveRDS(save_list, file)
    } 
  )
  
  # Reactive restore file
  restore_file <- reactive({
    validate(need(input$restore_state, message = FALSE))
    input$restore_state
  })
  
  # Reactive to store restored information
  restored_state <- reactive({
    rs <- readRDS(restore_file()$datapath)
    rs
  })
  
  # Restore state
  observeEvent(restored_state(), {
    rs <- restored_state()
    restored_ds(rs$data_ds)
    restored_decile(rs$deciles)
  })

}

restored_ds and restored_decile are objects of type reactiveValue which are initially NULL. They can be restored by uploading an RDS file, which causes the execution of the observeEvent(restored_state())… block and sets the value for these two reactive variables. The save_state button defines a function to store the data, details of the variables and any computed deciles into an RDS file. The modules are called using the callModule function with appropriate arguments.

Note that the entire behaviour is possible due to reactivity. For example, when the user starts the application, uploads a CSV file and starts creating deciles, the restored_ds and restored_decile values are NULL at this stage. Now, if the user uploads a previously saved RDS, then the reactivity of restored_ds ensures that the data_ds_vars returned by the data module is recomputed. This in turn ensures that the numeric_vars reactive in the distribution module is recomputed leading to a change in the dropdown. Also, the restored_decile argument ensures that any previously computed deciles are read from the RDS and the bin_numeric function is not called on these variables.

The complete code for the application is available as an RStudio project in Github.

To leave a comment for the author, please follow the link and comment on their blog: Anindya Mozumdar.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.