Subset views in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I don’t know how to do this in R. So let me just say why I can’t.
I wanted something akin to Boost‘s sub-matrix views, where you can have indexes map back to the original matrix, so you don’t create a new object.
Sounds straightforward, just overload ‘[[‘ to subtract the offset and check the length. Alas, no dice. R zealously copies objects to the point this is not (as far as I know, which isn’t much) possible.
To demonstrate, the following function executes and times expressions operating on a vector called “M”.
time.op = function(N, Exp) { Exp = parse(text=Exp) M = numeric(N) N.Trials = 10 Times = numeric(N.Trials) for (II in 1:N.Trials) { Times[[II]] = system.time(eval(Exp))[['elapsed']] } mean(Times) }
Like
> time.op(1e5, 'sqrt(M)') [1] 0.0024
Then see, does the size of M affect the time of the operation?
time.test = function(Exp) { Ns = 10^(1:6) Times = sapply(Ns, time.op, Exp=Exp) data.frame(Ns, Times) } > time.test('sqrt(M)') Ns Times 1 1e+01 0.0000 2 1e+02 0.0000 3 1e+03 0.0001 4 1e+04 0.0004 5 1e+05 0.0027 6 1e+06 0.0274
(obviously)
And here’s why we know it’s copying:
> time.test('list(M)') Ns Times 1 1e+01 0.0001 2 1e+02 0.0001 3 1e+03 0.0002 4 1e+04 0.0000 5 1e+05 0.0004 6 1e+06 0.0086
Or with attributes
> time.test('attr(M, "name") = "mike"') Ns Times 1 1e+01 0.0000 2 1e+02 0.0000 3 1e+03 0.0000 4 1e+04 0.0000 5 1e+05 0.0006 6 1e+06 0.0081
Good luck making a subset without copying!
And here’s the relevant parts of the R code.
Making a list (main/builtin.c)
for (i = 0; i < n; i++) { if (TAG(args) != R_NilValue) { SET_STRING_ELT(names, i, PRINTNAME(TAG(args))); havenames = 1; } else { SET_STRING_ELT(names, i, R_BlankString); } if (NAMED(CAR(args))) SET_VECTOR_ELT(list, i, duplicate(CAR(args))); else SET_VECTOR_ELT(list, i, CAR(args)); args = CDR(args); } if (havenames) { setAttrib(list, R_NamesSymbol, names); }
Note the repeated calls to "duplicate".
And yes, duplicate does copy, and it is deep (main/duplicate.c):
case VECSXP: n = LENGTH(s); PROTECT(s); PROTECT(t = allocVector(TYPEOF(s), n)); for(i = 0 ; i < n ; i++) SET_VECTOR_ELT(t, i, duplicate1(VECTOR_ELT(s, i))); DUPLICATE_ATTRIB(t, s); SET_TRUELENGTH(t, TRUELENGTH(s)); UNPROTECT(2); break;
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.