Quick conversion of a list of lists into a data frame
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Data frames are one of R’s distinguishing features. Exposing a list of lists as an array of cases, they make many formal operations such as regression or optimization easy to represent.
The R data.frame operation for lists is quite slow, in large part because it exposes a vast amount of functionality. This sample shows one way to write a much faster data.frame creator in C++ if one is willing to forego that generality.
#include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] List CheapDataFrameBuilder(List a) { List returned_frame = clone(a); GenericVector sample_row = returned_frame(0); StringVector row_names(sample_row.length()); for (int i = 0; i < sample_row.length(); ++i) { char name[5]; sprintf(&(name[0]), "%d", i); row_names(i) = name; } returned_frame.attr("row.names") = row_names; StringVector col_names(returned_frame.length()); for (int j = 0; j < returned_frame.length(); ++j) { char name[6]; sprintf(&(name[0]), "X.%d", j); col_names(j) = name; } returned_frame.attr("names") = col_names; returned_frame.attr("class") = "data.frame"; return returned_frame; }
Here is the result of comparing the native function to this version.
library(rbenchmark) a <- replicate(250, 1:100, simplify=FALSE) res <- benchmark(as.data.frame(a), CheapDataFrameBuilder(a), order="relative", replications=500) res[,1:4] test replications elapsed relative 2 CheapDataFrameBuilder(a) 500 0.104 1.0 1 as.data.frame(a) 500 16.730 160.9
There are some subtleties in this code:
— It turns out that one can’t send super-large data frames to it because of possible buffer overflows. I’ve never seen that problem when I’ve written Rcpp functions which exchanged SEXPs with R, but this one uses Rcpp:export in order to use sourceCpp.
— Notice the invocation of clone() in the first line of the code. If you don’t do that, you wind up side-effecting the parameter, which is not what most people would expect.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.