[This article was first published on S+/R – Yet Another Blog in Statistical Computing, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In R, there are a couple ways to convert the column-oriented data frame to a row-oriented dictionary list or alike, e.g. a list of lists.
In the code snippet below, I would show each approach and how to extract keys and values from the dictionary. As shown in the benchmark, it appears that the generic R data structure is still the most efficient.
### LIST() FUNCTION IN BASE PACKAGE ### x1 <- as.list(iris[1, ]) names(x1) # [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species" x1[["Sepal.Length"]] # [1] 5.1 ### ENVIRONMENT-BASED SOLUTION ### envn_dict <- function(x) { e <- new.env(hash = TRUE) for (name in names(x)) assign(name, x[, name], e) return(e) } x2 <- envn_dict(iris[1, ]) ls(x2) # [1] "Petal.Length" "Petal.Width" "Sepal.Length" "Sepal.Width" "Species" x2[["Sepal.Length"]] # [1] 5.1 ### COLLECTIONS PACKAGE ### coll_dict <- function(x) { d <- collections::Dict$new() for (name in names(x)) d$set(name, x[, name]) return(d) } x3 <- coll_dict(iris[1, ]) x3$keys() # [1] "Petal.Length" "Petal.Width" "Sepal.Length" "Sepal.Width" "Species" x3$get("Sepal.Length") # [1] 5.1 ### HASH PACKAGE ### hash_dict <- function(x) { d <- hash::hash() for (name in names(x)) d[[name]] <- x[, name] return(d) } x4 <- hash_dict(iris[1, ]) hash::keys(x4) # [1] "Petal.Length" "Petal.Width" "Sepal.Length" "Sepal.Width" "Species" hash::values(x4, "Sepal.Length") # Sepal.Length # 5.1 ### DATASTRUCTURES PACKAGE ### data_dict <- function(x) { d <- datastructures::hashmap() for (name in names(x)) d[name] <- x[, name] return(d) } x5 <- data_dict(iris[1, ]) datastructures::keys(x5) # [1] "Species" "Sepal.Width" "Petal.Length" "Sepal.Length" "Petal.Width" datastructures::get(x5, "Sepal.Length") # [1] 5.1 ### FROM PYTHON ### py2r_dict <- function(x) { return(reticulate::py_dict(names(x), x, TRUE)) } x6 <- py2r_dict(iris[1, ]) x6$keys() # [1] "Petal.Length" "Sepal.Length" "Petal.Width" "Sepal.Width" "Species" x6["Sepal.Length"] # [1] 5.1 ### CONVERT DATAFRAME TO DICTIONARY LIST ### to_list <- function(df, fn) { l <- list() for (i in seq(nrow(df))) l[[i]] <- fn(df[i, ]) return(l) } rbenchmark::benchmark(replications = 100, order = "elapsed", relative = "elapsed", columns = c("test", "replications", "elapsed", "relative", "user.self", "sys.self"), "BASE::LIST" = to_list(iris, as.list), "BASE::ENVIRONMENT" = to_list(iris, envn_dict), "COLLECTIONS::DICT" = to_list(iris, coll_dict), "HASH::HASH" = to_list(iris, hash_dict), "DATASTRUCTURES::HASHMAP" = to_list(iris, data_dict), "RETICULATE::PY_DICT" = to_list(iris, py2r_dict) ) # test replications elapsed relative user.self sys.self #1 BASE::LIST 100 0.857 1.000 0.857 0.000 #2 BASE::ENVIRONMENT 100 1.607 1.875 1.607 0.000 #4 HASH::HASH 100 2.600 3.034 2.600 0.000 #3 COLLECTIONS::DICT 100 2.956 3.449 2.956 0.000 #5 DATASTRUCTURES::HASHMAP 100 16.070 18.751 16.071 0.000 #6 RETICULATE::PY_DICT 100 18.030 21.039 18.023 0.008
To leave a comment for the author, please follow the link and comment on their blog: S+/R – Yet Another Blog in Statistical Computing.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.