Site icon R-bloggers

Shiny: Fast Data Loading with fst

[This article was first published on Philipp Probst, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I had several projects where I had to load in a big dataset for my shiny app. This loading was usually done in the beginning and would take more than 3 minutes. My target was to reduce this time. I starting thinking about the problem and discovered, that not the whole dataset is required when I start the app.

< !--excerpt-->

Fast and flexible data loading with fst

My first idea was to use a database. There is e.g. RSQLite. I also liked MonetDB a lot (much faster than RSQLite), but it was not possible to make it run on the system where it was required so I searched for alternatives.

Then I discovered the fst R-package. It is a package to save datasets (data.frame/data.table) in the fst-format. Data loading is reasonably fast. Moreover one can access rows and columns of the dataset without loading the whole dataset. So it basically provides some functionalities of a database. The partial loading of the dataset is much faster than loading the whole dataset.

So this was the functionality I needed for speeding up the data loading process.

My strategy in the Shiny App was then the following:

All in all I was able to reduce the loading time from 3 minutes to 5 seconds by this at the starting of the app. The data loading ĺater – usually only 1 or 2 columns at once – did not have a notable performance change on the app.

Some technical details

In the following some technical details on how I realized it in my app. The following packages are required:

library(shiny)
library(fst)
library(data.table)

I initiate the data as an empty reactive Value as well as reactive Values for the fst file and the selected rows of the fst file.

data= reactiveVal(NULL)
tmp_all = reactiveValues(fst = NULL, rows_fst = NULL, cols_fst = NULL)

Then I get the fst File that I saved beforehand with write_fst. Note that the dataset is not loaded yet with this command.

tmp_fst = fst(my_path)

I specify the rows and columns I want to load and save them in tmp_all:

rows_fst = tmp_fst$year <= bis
cols_fst = c("ID", "year", "outcome")
tmp_all$rows_fst = rows_fst
tmp_all$fst = tmp_fst

Then I load the dataset as data.table and save it in the reactive Value:

tmp = tmp_fst[rows_fst, select_cols] %>% setDT()
data(tmp)

I wrote a function to add variables afterwards. I test first if they are already available and only add new variables:

add_variable <- function(tmp, tmp_all, new_vars) {
  inputs = new_vars[!(new_vars %in% colnames(tmp))]
  if(length(inputs) > 0) {
    tmp_fst = tmp_all$fst
    rows_fst = tmp_all$rows_fst
    tmp_calc = tmp_fst[rows_fst, inputs, drop = FALSE] #%>% setDT()
    tmp = tmp[, (inputs) := tmp_calc]
  }
  return(tmp)
}

Finally the variables are added to the dataset and are now readily available for the Shiny app.

new_vars = c("the_new_variable")
tmp <- add_variable(tmp = data(), tmp_all, new_vars)
data_filtered(tmp)

Feedback

Let me if you had similar problems and how you solved it. Maybe you have some ideas for improvement.

To leave a comment for the author, please follow the link and comment on their blog: Philipp Probst.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.