Speed up your R scripts. A cool optimized way to load, write and store big data frames with FST package!
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Are you trying to save and load your DL model or a big dataset in R? Here we show you a performance boost to your scripts and reduction in disk memory storage with the FST CRAN package. We are going to benchmark it with R base functions (csv and RDS extensions) and another great package like readr:
library(tidyverse) big_dataset %>% nrow() # 700k rows, 15 cols(8 factor, 4int, 3 logi) library(microbenchmark) library(readr) library(fst) microbenchmark( write.csv(big_dataset, paste0(path,"big_dataset.csv"),), # utils write_csv(big_dataset, paste0(path,"big_dataset.csv")), # readr write_csv(big_dataset, paste0(path,"big_dataset.csv.gz"),), # readr GZ saveRDS(big_dataset, paste0(path,"big_dataset.RDS")), # utils write_rds(big_dataset, paste0(path,"big_dataset.RDS")), # readr write_fst(big_dataset, paste0(path,"big_dataset.fst")), # fst times = 10 ) ## Unit: milliseconds ## min mean median max neval file_size ##utils 10943.1161 11232.20073 11098.66610 12011.1538 10 109 MB ##readr 3140.4450 3442.92772 3388.14280 3768.4109 10 109 MB ##readrGZ 6993.8850 7332.31976 7260.95040 7946.9233 10 23 MB ##base 4800.3516 5122.22345 5024.69395 5833.9807 10 15 MB ##readr 187.0765 210.74584 211.70760 246.6369 10 46 MB "fst 60.3065 87.30611 74.94375 154.7718 10 16 MB"
Wow! That was cool! We can achieve an amazing reading and writing speed plus an incredible file size!
We can see a x3 and x50 performance improvements over the readr::write_rds() and base saveRDS() functions!
An incredible x100 performance between fst and csv writing functions, but the true here is that they are not directly comparable as they work with quite different file formats.
Are you going to add FST to your R projects toolbox too?
See related useful tips on TypeThePipe
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.