Shuffling Columns With data.table
[This article was first published on tshafer.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Yesterday, in a post syndicated to R-bloggers, kjytay asked about how to programmatically shuffle a data.table column in place, as the straightforward way didn’t work well.
Here are two other ways to solve the same problem, one using
data.table::set()
and the other .SDcols
:
scramble_set <- function(input_dt, colname) { set(input_dt, j = colname, value = sample(input_dt[[colname]])) } scramble_sd <- function(input_dt, colname) { input_dt[, c(colname) := .SD[sample(.I, .N)], .SDcols = colname] }
Each approach returns the correct result and avoids the strange dispatch problem when trying to shuffle a column named “colname”.
It’s good to check performance with these kinds of things, too,
especially when .SD
is involved, and set()
handily
outperforms the other two solutions (kjytay’s original solution I
named “orig”):
microbenchmark( orig = scramble_orig(input_dt, "x"), set = scramble_set(input_dt, "x"), sd = scramble_sd(input_dt, "x"), setup = { input_dt <- data.table(x = 1:5) set.seed(1) }, check = "identical" ) Unit: microseconds expr min lq mean median uq max neval orig 291.970 315.4400 351.52132 319.474 327.5635 3248.663 100 set 33.196 36.0965 61.62936 37.262 39.5380 2419.880 100 sd 557.834 591.2370 636.88657 597.579 616.2675 3821.737 100
This post is kindly republished by R-bloggers.
To leave a comment for the author, please follow the link and comment on their blog: tshafer.com.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.