Faster Way to Slice Dataframe by Row
[This article was first published on S+/R – Yet Another Blog in Statistical Computing, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
When we’d like to slice a dataframe by row, we can employ the split() function or the iter() function in the iterators package.
By leveraging the power of parallelism, I wrote an utility function slice() to faster slice the dataframe. In the example shown below, the slice() is 3 times more efficient than the split() or the iter() to select 2 records out of 5,960 rows.
df <- read.csv("hmeq.csv") nrow(df) # [1] 5960 slice <- function(df) { return(parallel::mcMap(function(i) df[i, ], seq(nrow(df)), mc.cores = parallel::detectCores())) } Reduce(rbind, Filter(function(x) x$DEROG == 10, slice(df))) # BAD LOAN MORTDUE VALUE REASON JOB YOJ DEROG DELINQ CLAGE NINQ CLNO DEBTINC #3094 1 16800 16204 27781 HomeImp Other 1 10 0 190.57710 0 9 27.14689 #3280 1 17500 76100 98500 DebtCon Other 5 10 1 59.83333 5 16 NA rbenchmark::benchmark(replications = 10, order = "elapsed", relative = "elapsed", columns = c("test", "replications", "elapsed", "relative"), "SPLIT" = Reduce(rbind, Filter(Negate(function(x) x$DEROG != 10), split(df, seq(nrow(df))))), "ITER " = Reduce(rbind, Filter(Negate(function(x) x$DEROG != 10), as.list(iterators::iter(df, by = "row")))), "SLICE" = Reduce(rbind, Filter(Negate(function(x) x$DEROG != 10), slice(df))) ) # test replications elapsed relative # SLICE 10 2.224 1.000 # SPLIT 10 7.185 3.231 # ITER 10 7.375 3.316
To leave a comment for the author, please follow the link and comment on their blog: S+/R – Yet Another Blog in Statistical Computing.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.