Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I had several projects where I had to load in a big dataset for my shiny app. This loading was usually done in the beginning and would take more than 3 minutes. My target was to reduce this time. I starting thinking about the problem and discovered, that not the whole dataset is required when I start the app.
< !--excerpt-->Fast and flexible data loading with fst
My first idea was to use a database. There is e.g. RSQLite. I also liked MonetDB a lot (much faster than RSQLite), but it was not possible to make it run on the system where it was required so I searched for alternatives.
Then I discovered the fst R-package. It is a package to save datasets (data.frame/data.table) in the fst-format. Data loading is reasonably fast. Moreover one can access rows and columns of the dataset without loading the whole dataset. So it basically provides some functionalities of a database. The partial loading of the dataset is much faster than loading the whole dataset.
So this was the functionality I needed for speeding up the data loading process.
My strategy in the Shiny App was then the following:
- At the beginning just load the data that is really required for the dashboard at the beginning.
- Possible required data transformations (that are always required) are done in a data preparation script once beforehand.
- Meta-Data such as possible choices for inputs is saved in a separate .RData file called meta_data.RData.
- The update of this meta-data is part of the data preparation process. This meta-data is small and loaded at every start of the app.
- If some columns or rows are needed afterwards they are loaded afterwards into the app and added to the existing dataset.
All in all I was able to reduce the loading time from 3 minutes to 5 seconds by this at the starting of the app. The data loading ĺater – usually only 1 or 2 columns at once – did not have a notable performance change on the app.
Some technical details
In the following some technical details on how I realized it in my app. The following packages are required:
library(shiny) library(fst) library(data.table)
I initiate the data as an empty reactive Value as well as reactive Values for the fst file and the selected rows of the fst file.
data= reactiveVal(NULL) tmp_all = reactiveValues(fst = NULL, rows_fst = NULL, cols_fst = NULL)
Then I get the fst File that I saved beforehand with write_fst. Note that the dataset is not loaded yet with this command.
tmp_fst = fst(my_path)
I specify the rows and columns I want to load and save them in tmp_all:
rows_fst = tmp_fst$year <= bis cols_fst = c("ID", "year", "outcome") tmp_all$rows_fst = rows_fst tmp_all$fst = tmp_fst
Then I load the dataset as data.table and save it in the reactive Value:
tmp = tmp_fst[rows_fst, select_cols] %>% setDT() data(tmp)
I wrote a function to add variables afterwards. I test first if they are already available and only add new variables:
add_variable <- function(tmp, tmp_all, new_vars) { inputs = new_vars[!(new_vars %in% colnames(tmp))] if(length(inputs) > 0) { tmp_fst = tmp_all$fst rows_fst = tmp_all$rows_fst tmp_calc = tmp_fst[rows_fst, inputs, drop = FALSE] #%>% setDT() tmp = tmp[, (inputs) := tmp_calc] } return(tmp) }
Finally the variables are added to the dataset and are now readily available for the Shiny app.
new_vars = c("the_new_variable") tmp <- add_variable(tmp = data(), tmp_all, new_vars) data_filtered(tmp)
Feedback
Let me if you had similar problems and how you solved it. Maybe you have some ideas for improvement.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.