Site icon R-bloggers

Subset views in R

[This article was first published on Struggling Through Problems » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I don’t know how to do this in R. So let me just say why I can’t.

I wanted something akin to Boost‘s sub-matrix views, where you can have indexes map back to the original matrix, so you don’t create a new object.

Sounds straightforward, just overload ‘[[‘ to subtract the offset and check the length. Alas, no dice. R zealously copies objects to the point this is not (as far as I know, which isn’t much) possible.

To demonstrate, the following function executes and times expressions operating on a vector called “M”.

time.op = function(N, Exp) {
        Exp = parse(text=Exp)
        M = numeric(N)

        N.Trials = 10
        Times = numeric(N.Trials)

        for (II in 1:N.Trials) {
                Times[[II]] = system.time(eval(Exp))[['elapsed']]
        }

        mean(Times)
}

Like

> time.op(1e5, 'sqrt(M)')
[1] 0.0024

Then see, does the size of M affect the time of the operation?

time.test = function(Exp) {
        Ns = 10^(1:6)
        Times = sapply(Ns, time.op, Exp=Exp)

        data.frame(Ns, Times)
}
> time.test('sqrt(M)')
     Ns  Times
1 1e+01 0.0000
2 1e+02 0.0000
3 1e+03 0.0001
4 1e+04 0.0004
5 1e+05 0.0027
6 1e+06 0.0274

(obviously)

And here’s why we know it’s copying:

> time.test('list(M)')
     Ns  Times
1 1e+01 0.0001
2 1e+02 0.0001
3 1e+03 0.0002
4 1e+04 0.0000
5 1e+05 0.0004
6 1e+06 0.0086

Or with attributes

> time.test('attr(M, "name") = "mike"')
     Ns  Times
1 1e+01 0.0000
2 1e+02 0.0000
3 1e+03 0.0000
4 1e+04 0.0000
5 1e+05 0.0006
6 1e+06 0.0081

Good luck making a subset without copying!

And here’s the relevant parts of the R code.

Making a list (main/builtin.c)

    for (i = 0; i < n; i++) {
                if (TAG(args) != R_NilValue) {
                    SET_STRING_ELT(names, i, PRINTNAME(TAG(args)));
                    havenames = 1;
                }
                else {
                    SET_STRING_ELT(names, i, R_BlankString);
                }
                if (NAMED(CAR(args)))
                    SET_VECTOR_ELT(list, i, duplicate(CAR(args)));
                else
                    SET_VECTOR_ELT(list, i, CAR(args));
                args = CDR(args);
            }
            if (havenames) {
                setAttrib(list, R_NamesSymbol, names);
    }

Note the repeated calls to “duplicate”.

And yes, duplicate does copy, and it is deep (main/duplicate.c):

case VECSXP:
        n = LENGTH(s);
        PROTECT(s);
        PROTECT(t = allocVector(TYPEOF(s), n));
        for(i = 0 ; i < n ; i++)
            SET_VECTOR_ELT(t, i, duplicate1(VECTOR_ELT(s, i)));
        DUPLICATE_ATTRIB(t, s);
        SET_TRUELENGTH(t, TRUELENGTH(s));
        UNPROTECT(2);
        break;

To leave a comment for the author, please follow the link and comment on their blog: Struggling Through Problems » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.