Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
R/Shiny allows you to prototype a working web application quickly and easily. However, with increasing amounts of data, your app may become slow and, in extreme cases, crash due to insufficient memory.
When the worst-case scenario happens, we need to figure out a way to lower the memory usage of our app to avoid those crashes.
A crucial part of optimization efforts is benchmarking how much memory our app is consuming. This allows us to check if the changes we made to the app are indeed moving us in the right direction.
In this step-by-step guide, we will describe how to do that based on an example application.
Table of Contents
How to Measure Memory Usage of Shiny
You might already be familiar with the {profmem}
package for profiling memory usage of R expressions. {profmem}
uses Rprofmem
under the hood and in the docs, we can find that with utils::Rprofmem()
it is not possible to quantify the total memory usage at a given time because it only logs allocations and does, therefore, not reflect deallocations done by the garbage collector.
Additionally, Rprofmem
does not track allocations made by non-R native libraries or packages that use native calloc()
or free()
for internal objects.
In the context of Shiny, we are usually interested in how much memory the R process running our app is using. That information allows us to estimate what infrastructure we will need to provision in order to host our app and get an overall feel of how our app scales memory-wise (e.g. does memory usage increase drastically with more users?).
To achieve that, we will use the {bench} package, which provides the bench_process_memory
function. That function uses operating system APIs to determine how much memory is used by the current R process, including all the memory from child processes and memory allocated outside R’s garbage collector heap.
bench::bench_process_memory
informs us not only about the currently used amount of memory but also about the peak memory usage that occurred during the process lifecycle.
{bench}
already supports that. Hence, we recommend using {bench}
.
Throughout our example, we will use the following helper function:
wait_for_app_to_start <- function(url) { httr2::request(url) |> httr2::req_retry( max_seconds = 5, backoff = function(attempt) 2 ** attempt ) } measure_mem_usage <- function() { result_file <- tempfile(fileext = "RDS") port <- httpuv::randomPort() app_process <- callr::r_bg( function(result_file, port) { on.exit({ saveRDS(bench::bench_process_memory(), result_file) }) shiny::runApp(port = port) }, args = list(result_file = result_file, port = port)) on.exit({ if (app_process$is_alive()) { app_process$kill() } }) app_url <- paste0("http://127.0.0.1:", port) wait_for_app_to_start(app_url) utils::browseURL(app_url) cat ("Press [enter] to finish the test...") line <- readline() app_process$interrupt() app_process$wait() readRDS(result_file) }
Let’s break down one by one what is happening in this function:
- We start a shiny app in a separate R process – this is important as we don’t want the work we did previously in our R session to impact the results (e.g. we might have analyzed a large dataset which could be the source of peak memory usage)
- We register a callback on function exit that will save the memory measurements in a temporary file
- After the background R process with our app is started, our function opens the app in our browser and waits for user input. This gives us time to simulate user interactions with our app.
- Once we are done clicking through our app, we can hit enter in our R console, and the background process will be interrupted. Once the background process terminates, we read memory measurements from the temporary file.
Let’s see that in action:
Discover more insights on boosting your app’s speed and efficiency in our detailed piece: shiny.benchmark – How to Measure Performance Improvements in R Shiny Apps.
Example App
All right, now let’s use our memory benchmarking function on an actual app. Let’s assume we are working with credit card data; we will generate a fake dataset using {charlatan}
and save it in an SQLite database:
library(charlatan) library(DBI) library(dplyr) set.seed(123) # Generate Fake Data TABLE_ROW_COUNT <- 1e7 fake_providers <- ch_credit_card_provider(100) fake_data <- data.frame( provider = sample(fake_providers, size = TABLE_ROW_COUNT, replace = TRUE) ) # Save data to sqlite database conn <- dbConnect(drv = RSQLite::SQLite(), "database.sqlite") dbWriteTable( conn = conn, name = "credit_cards", value = fake_data, overwrite = TRUE )
Now, let’s create a Shiny App that will display the top 10 most popular card providers:
library(DBI) library(dplyr) library(reactable) library(shiny) conn <- dbConnect(drv = RSQLite::SQLite(), "database.sqlite") shiny::onStop(function() { dbDisconnect(conn) }) ui <- fluidPage( titlePanel("Credit Cards App"), reactableOutput("top_credit_providers") ) server <- function(input, output, session) { credit_cards <- dbGetQuery( conn = conn, "SELECT * FROM credit_cards" ) output$top_credit_providers <- renderReactable({ top_providers <- credit_cards |> group_by(provider) |> summarise(popularity = n()) |> arrange(desc(popularity)) |> head(10) |> collect() reactable(top_providers) }) } shinyApp(ui, server)
Let’s see how much memory the app is using using our helper function:
> measure_mem_usage() Press [enter] to finish the test... current max 481MB 481MB
Ok, now let’s see how that changes if we simulate multiple sessions within the app – this can be done by opening multiple tabs with our app. Here are the results for 2, 3, 4 and 5 sessions:
> measure_mem_usage() # 2 sessions Press [enter] to finish the test... current max 606MB 606MB > measure_mem_usage() # 3 sessions Press [enter] to finish the test... current max 678MB 678MB > measure_mem_usage() # 4 sessions Press [enter] to finish the test... current max 769MB 769MB > measure_mem_usage() # 5 sessions Press [enter] to finish the test... current max 844MB 844MB
Based on the above measurements, we can see that for each session we are allocating extra 72MB – 100MB of memory.
Let’s try to make our app more efficient, some of you probably noticed that we are fetching the data separately for each session which means we store the same data multiple times in our app.
We can make that more efficient by fetching the data in the global scope.
library(DBI) library(dplyr) library(reactable) library(shiny) conn <- dbConnect(drv = RSQLite::SQLite(), "database.sqlite") credit_cards <- dbGetQuery( conn = conn, "SELECT * FROM credit_cards" ) shiny::onStop(function() { dbDisconnect(conn) }) ui <- fluidPage( titlePanel("Credit Cards App"), reactableOutput("top_credit_providers") ) server <- function(input, output, session) { output$top_credit_providers <- renderReactable({ top_providers <- credit_cards |> group_by(provider) |> summarise(popularity = n()) |> arrange(desc(popularity)) |> head(10) |> collect() reactable(top_providers) }) } shinyApp(ui, server)
Let’s measure if that made our app more memory efficient:
> measure_mem_usage() # 1 session Press [enter] to finish the test... current max 474MB 474MB > measure_mem_usage() # 2 sessions Press [enter] to finish the test... current max 497MB 497MB > measure_mem_usage() # 3 sessions Press [enter] to finish the test... current max 503MB 503MB > measure_mem_usage() # 4 sessions Press [enter] to finish the test... current max 530MB 530MB > measure_mem_usage() # 5 sessions Press [enter] to finish the test... current max 546MB 546MB
As we can see now our app allocates an extra 6 – 27MB per session this is an almost 4x improvement!
Let’s try to make it even better! Currently we are fetching the whole credit card data into the R process memory, but we only display the top 10 values! What a waste of memory!
Let’s fix that by extracting computations into the database – this is very thanks to {dbplyr}
as we can reuse the same {dplyr}
functions.
library(DBI) library(dplyr) library(reactable) library(shiny) conn <- dbConnect(drv = RSQLite::SQLite(), "database.sqlite") credit_cards <- tbl(conn,"credit_cards") shiny::onStop(function() { dbDisconnect(conn) }) ui <- fluidPage( titlePanel("Credit Cards App"), reactableOutput("top_credit_providers") ) server <- function(input, output, session) { output$top_credit_providers <- renderReactable({ top_providers <- credit_cards |> group_by(provider) |> summarise(popularity = n()) |> arrange(desc(popularity)) |> head(10) |> collect() reactable(top_providers) }) } shinyApp(ui, server)
Let’s repeat our benchmarks again:
> measure_mem_usage() # 1 session Press [enter] to finish the test... current max 229MB 229MB > measure_mem_usage() # 2 sessions Press [enter] to finish the test... current max 225MB 225MB > measure_mem_usage() # 2 sessions Press [enter] to finish the test... current max 231MB 231MB > measure_mem_usage() # 3 sessions Press [enter] to finish the test... current max 232MB 232MB > measure_mem_usage() # 4 sessions Press [enter] to finish the test... current max 233MB 233MB > measure_mem_usage() # 5 sessions Press [enter] to finish the test... current max 233MB 233MB
Now the memory usage of our app seems to be barely increasing; there is only a 4MB difference between the app used by 1 user and the app used by 5 users.
Not to mention that compared to the apps that were fetching whole datasets into memory, we are saving 245MB of memory!
Limitations
The described method of measuring memory usage of a memory app has its limitations. For example, if our app is using {promises}, depending on the type of future backend we are using our measurements might be less accurate.
If our backend uses child processes, bench::bench_process_memory will include them in the measurements. For example, when using future::multicore
, futures are run in child processes of the main R process.
However, if we are using future::multisession
, futures are run in separate processes (not child processes), and in that case, memory used by those processes won’t be included in the measurements.
Conclusion
In this blog post, we described how to benchmark memory usage of the Shiny app using the {bench}
package.
Additionally, we showed that by extracting computations into a database, we can make an almost 4x improvement in terms of memory usage.
This improves the scalability of our application and might allow us to cut down on infrastructure costs, as machines with less memory can be used to handle the same traffic.
If you found this article helpful, don’t miss out on the latest trends and advancements in R/Shiny — subscribe to Shiny Weekly for regular updates and exclusive content.
The post appeared first on appsilon.com/blog/.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.