Subset views in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I don’t know how to do this in R. So let me just say why I can’t.
I wanted something akin to Boost‘s sub-matrix views, where you can have indexes map back to the original matrix, so you don’t create a new object.
Sounds straightforward, just overload ‘[[‘ to subtract the offset and check the length. Alas, no dice. R zealously copies objects to the point this is not (as far as I know, which isn’t much) possible.
To demonstrate, the following function executes and times expressions operating on a vector called “M”.
time.op = function(N, Exp) {
Exp = parse(text=Exp)
M = numeric(N)
N.Trials = 10
Times = numeric(N.Trials)
for (II in 1:N.Trials) {
Times[[II]] = system.time(eval(Exp))[['elapsed']]
}
mean(Times)
}
Like
> time.op(1e5, 'sqrt(M)') [1] 0.0024
Then see, does the size of M affect the time of the operation?
time.test = function(Exp) {
Ns = 10^(1:6)
Times = sapply(Ns, time.op, Exp=Exp)
data.frame(Ns, Times)
}
> time.test('sqrt(M)')
Ns Times
1 1e+01 0.0000
2 1e+02 0.0000
3 1e+03 0.0001
4 1e+04 0.0004
5 1e+05 0.0027
6 1e+06 0.0274
(obviously)
And here’s why we know it’s copying:
> time.test('list(M)')
Ns Times
1 1e+01 0.0001
2 1e+02 0.0001
3 1e+03 0.0002
4 1e+04 0.0000
5 1e+05 0.0004
6 1e+06 0.0086
Or with attributes
> time.test('attr(M, "name") = "mike"')
Ns Times
1 1e+01 0.0000
2 1e+02 0.0000
3 1e+03 0.0000
4 1e+04 0.0000
5 1e+05 0.0006
6 1e+06 0.0081
Good luck making a subset without copying!
And here’s the relevant parts of the R code.
Making a list (main/builtin.c)
for (i = 0; i < n; i++) {
if (TAG(args) != R_NilValue) {
SET_STRING_ELT(names, i, PRINTNAME(TAG(args)));
havenames = 1;
}
else {
SET_STRING_ELT(names, i, R_BlankString);
}
if (NAMED(CAR(args)))
SET_VECTOR_ELT(list, i, duplicate(CAR(args)));
else
SET_VECTOR_ELT(list, i, CAR(args));
args = CDR(args);
}
if (havenames) {
setAttrib(list, R_NamesSymbol, names);
}
Note the repeated calls to "duplicate".
And yes, duplicate does copy, and it is deep (main/duplicate.c):
case VECSXP:
n = LENGTH(s);
PROTECT(s);
PROTECT(t = allocVector(TYPEOF(s), n));
for(i = 0 ; i < n ; i++)
SET_VECTOR_ELT(t, i, duplicate1(VECTOR_ELT(s, i)));
DUPLICATE_ATTRIB(t, s);
SET_TRUELENGTH(t, TRUELENGTH(s));
UNPROTECT(2);
break;
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.