Site icon R-bloggers

Indexing Nested Lists

[This article was first published on [R]appster, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’ve long searched for a somewhat efficient approach to indexing nested lists and/or environments and here’s my best solution so far.

For me, being able to compute such an index is the crucial part in order to actually manage such nested structures (which are very helpful in a lot of scenarios where formal classes are too inflexible). What you need to master are a) being able to select a specific branch in the nested list and b) being able to update it (e.g. adding new branches). A use case would be parsing a config file format to a nested list structure.

Initially, I tried to get the job done by recursively traversing the branch structure, but that turned out to be really messy and inefficient.

Basically, the current approach is all about mimicing what ‘str()’ does and putting this into a data frame structure:

x str(x)
List of 1
$ a:List of 2
..$ a.1: int [1:3] 1 2 3
..$ a.2: logi TRUE

At this point, I’ll simply present you what the result of indexing such nested structure looks like. As I don’t have a Github repository set up yet, I’ll put the source code at the very end. Of course, there’s a lot of overhead since it’s based on a lot of regular expression stuff. If you have any suggestions for improvement, you’re more than welcome to tell me!

I’m very sorry for the messed up indentation, but I’ve tried twice. WordPress keeps messing it up somehow. Suggestions on how to avoid this greatly appreciated.

Example list

setClass("TESTCLASS_X", representation=representation(a="numeric"))
setRefClass("TESTCLASS_Y", fields=list(a="numeric"))

src a=list(
a.1=list(a.1.1 = 1, a.1.2 = 1:5, a.1.3 = integer(0)),
a.2=list(a.2.1="a", a.2.2=c("a", "b", "c", "d", "e"), a.2.3=character()),
a.3=list(a.3.1=1.5, a.3.2=c(1.5, 2.5), a.3.3=numeric()),
a.4=list(a.4.1=1+1i, a.4.2=c(0.1+0.3i, 0.2+0.2i, 0.1+0.1i), a.4.3=complex()),
a.5=list(a.5.1 = TRUE, a.5.2 = c(TRUE, TRUE, TRUE), a.5.3 = logical())
),
b=list(
b.1=list(
b.1.1=list(b.1.1.1 =matrix(1, ncol=1),
b.1.1.2=matrix(c(1:4), ncol=2, dimnames=list(NULL, c("a","b"))),
b.1.1.3=matrix()
),
b.1.2=list(b.1.2.1=data.frame(1), b.1.2.2=data.frame(a=1:3, b=1:3),
b.1.2.3=data.frame()
)
),
b.2=list(
b.2.1=list(b.2.1.1=list(a=1), b.2.1.2=list(a=1:2, b=letters[1:2]),
b.2.1.3=list(a=1:2, b="a", c=1:3, d=FALSE, e=1:5), b.2.1.4=list()
)
)
),
c=list(
c.1=list(c.1.1=new.env(), c.1.2=new("TESTCLASS_X", a=1:5),
c.1.3=new("TESTCLASS_Y", a=1:5))
)
)

After indexing

> objectIndex(src=src)
name pos is.top is.bottom       class is.s4  dim
1                      a   1   TRUE     FALSE        list FALSE    5
2                  a/a.1   2  FALSE     FALSE        list FALSE    3
3            a/a.1/a.1.1   3  FALSE      TRUE     numeric FALSE    1
4            a/a.1/a.1.2   3  FALSE      TRUE     integer FALSE    5
5            a/a.1/a.1.3   3  FALSE      TRUE     integer FALSE    0
6                  a/a.2   2  FALSE     FALSE        list FALSE    3
7            a/a.2/a.2.1   3  FALSE      TRUE   character FALSE    1
8            a/a.2/a.2.2   3  FALSE      TRUE   character FALSE    5
9            a/a.2/a.2.3   3  FALSE      TRUE   character FALSE    0
10                 a/a.3   2  FALSE     FALSE        list FALSE    3
11           a/a.3/a.3.1   3  FALSE      TRUE     numeric FALSE    1
12           a/a.3/a.3.2   3  FALSE      TRUE     numeric FALSE    2
13           a/a.3/a.3.3   3  FALSE      TRUE     numeric FALSE    0
14                 a/a.4   2  FALSE     FALSE        list FALSE    3
15           a/a.4/a.4.1   3  FALSE      TRUE     complex FALSE    1
16           a/a.4/a.4.2   3  FALSE      TRUE     complex FALSE    3
17           a/a.4/a.4.3   3  FALSE      TRUE     complex FALSE    0
18                 a/a.5   2  FALSE     FALSE        list FALSE    3
19           a/a.5/a.5.1   3  FALSE      TRUE     logical FALSE    1
20           a/a.5/a.5.2   3  FALSE      TRUE     logical FALSE    3
21           a/a.5/a.5.3   3  FALSE      TRUE     logical FALSE    0
22                     b   1   TRUE     FALSE        list FALSE    2
23                 b/b.1   2  FALSE     FALSE        list FALSE    2
24           b/b.1/b.1.1   3  FALSE     FALSE        list FALSE    3
25   b/b.1/b.1.1/b.1.1.1   4  FALSE      TRUE      matrix FALSE  1-1
26   b/b.1/b.1.1/b.1.1.2   4  FALSE      TRUE      matrix FALSE  2-2
27   b/b.1/b.1.1/b.1.1.3   4  FALSE      TRUE      matrix FALSE  1-1
28           b/b.1/b.1.2   3  FALSE     FALSE        list FALSE    3
29   b/b.1/b.1.2/b.1.2.1   4  FALSE      TRUE  data.frame FALSE  1-1
30   b/b.1/b.1.2/b.1.2.2   4  FALSE      TRUE  data.frame FALSE  3-2
31   b/b.1/b.1.2/b.1.2.3   4  FALSE      TRUE  data.frame FALSE  0-0
32                 b/b.2   2  FALSE     FALSE        list FALSE    1
33           b/b.2/b.2.1   3  FALSE     FALSE        list FALSE    4
34   b/b.2/b.2.1/b.2.1.1   4  FALSE     FALSE        list FALSE    1
35 b/b.2/b.2.1/b.2.1.1/a   5  FALSE      TRUE     numeric FALSE    1
36   b/b.2/b.2.1/b.2.1.2   4  FALSE     FALSE        list FALSE    2
37 b/b.2/b.2.1/b.2.1.2/a   5  FALSE      TRUE     integer FALSE    2
38 b/b.2/b.2.1/b.2.1.2/b   5  FALSE      TRUE   character FALSE    2
39   b/b.2/b.2.1/b.2.1.3   4  FALSE     FALSE        list FALSE    5
40 b/b.2/b.2.1/b.2.1.3/a   5  FALSE      TRUE     integer FALSE    2
41 b/b.2/b.2.1/b.2.1.3/b   5  FALSE      TRUE   character FALSE    1
42 b/b.2/b.2.1/b.2.1.3/c   5  FALSE      TRUE     integer FALSE    3
43 b/b.2/b.2.1/b.2.1.3/d   5  FALSE      TRUE     logical FALSE    1
44 b/b.2/b.2.1/b.2.1.3/e   5  FALSE      TRUE     integer FALSE    5
45   b/b.2/b.2.1/b.2.1.4   4  FALSE     FALSE        list FALSE    0
46                     c   1   TRUE     FALSE        list FALSE    1
47                 c/c.1   2  FALSE     FALSE        list FALSE    3
48           c/c.1/c.1.1   3  FALSE      TRUE environment FALSE
49           c/c.1/c.1.2   3  FALSE      TRUE TESTCLASS_X  TRUE    1
50           c/c.1/c.1.3   3  FALSE      TRUE TESTCLASS_Y  TRUE    1

> objectIndex(src=src, handle.preserve="all")
name pos is.top is.bottom       class is.s4  dim
1                    a   1   TRUE     FALSE        list FALSE    5
2                a/a.1   2  FALSE      TRUE        list FALSE    3
3                a/a.2   2  FALSE      TRUE        list FALSE    3
4                a/a.3   2  FALSE      TRUE        list FALSE    3
5                a/a.4   2  FALSE      TRUE        list FALSE    3
6                a/a.5   2  FALSE      TRUE        list FALSE    3
7                    b   1   TRUE     FALSE        list FALSE    2
8                b/b.1   2  FALSE     FALSE        list FALSE    2
9          b/b.1/b.1.1   3  FALSE     FALSE        list FALSE    3
10 b/b.1/b.1.1/b.1.1.1   4  FALSE      TRUE      matrix FALSE  1-1
11 b/b.1/b.1.1/b.1.1.2   4  FALSE      TRUE      matrix FALSE  2-2
12 b/b.1/b.1.1/b.1.1.3   4  FALSE      TRUE      matrix FALSE  1-1
13         b/b.1/b.1.2   3  FALSE     FALSE        list FALSE    3
14 b/b.1/b.1.2/b.1.2.1   4  FALSE      TRUE  data.frame FALSE  1-1
15 b/b.1/b.1.2/b.1.2.2   4  FALSE      TRUE  data.frame FALSE  3-2
16 b/b.1/b.1.2/b.1.2.3   4  FALSE      TRUE  data.frame FALSE  0-0
17               b/b.2   2  FALSE     FALSE        list FALSE    1
18         b/b.2/b.2.1   3  FALSE     FALSE        list FALSE    4
19 b/b.2/b.2.1/b.2.1.1   4  FALSE      TRUE        list FALSE    1
20 b/b.2/b.2.1/b.2.1.2   4  FALSE      TRUE        list FALSE    2
21 b/b.2/b.2.1/b.2.1.3   4  FALSE      TRUE        list FALSE    5
22 b/b.2/b.2.1/b.2.1.4   4  FALSE     FALSE        list FALSE    0
23                   c   1   TRUE     FALSE        list FALSE    1
24               c/c.1   2  FALSE     FALSE        list FALSE    3
25         c/c.1/c.1.1   3  FALSE      TRUE environment FALSE
26         c/c.1/c.1.2   3  FALSE     FALSE TESTCLASS_X  TRUE    1
27         c/c.1/c.1.3   3  FALSE     FALSE TESTCLASS_Y  TRUE    1

> objectIndex(src=src, handle.preserve="data.frame")
name pos is.top is.bottom       class is.s4  dim
1                      a   1   TRUE     FALSE        list FALSE    5
2                  a/a.1   2  FALSE     FALSE        list FALSE    3
3            a/a.1/a.1.1   3  FALSE      TRUE     numeric FALSE    1
4            a/a.1/a.1.2   3  FALSE      TRUE     integer FALSE    5
5            a/a.1/a.1.3   3  FALSE      TRUE     integer FALSE    0
6                  a/a.2   2  FALSE     FALSE        list FALSE    3
7            a/a.2/a.2.1   3  FALSE      TRUE   character FALSE    1
8            a/a.2/a.2.2   3  FALSE      TRUE   character FALSE    5
9            a/a.2/a.2.3   3  FALSE      TRUE   character FALSE    0
10                 a/a.3   2  FALSE     FALSE        list FALSE    3
11           a/a.3/a.3.1   3  FALSE      TRUE     numeric FALSE    1
12           a/a.3/a.3.2   3  FALSE      TRUE     numeric FALSE    2
13           a/a.3/a.3.3   3  FALSE      TRUE     numeric FALSE    0
14                 a/a.4   2  FALSE     FALSE        list FALSE    3
15           a/a.4/a.4.1   3  FALSE      TRUE     complex FALSE    1
16           a/a.4/a.4.2   3  FALSE      TRUE     complex FALSE    3
17           a/a.4/a.4.3   3  FALSE      TRUE     complex FALSE    0
18                 a/a.5   2  FALSE     FALSE        list FALSE    3
19           a/a.5/a.5.1   3  FALSE      TRUE     logical FALSE    1
20           a/a.5/a.5.2   3  FALSE      TRUE     logical FALSE    3
21           a/a.5/a.5.3   3  FALSE      TRUE     logical FALSE    0
22                     b   1   TRUE     FALSE        list FALSE    2
23                 b/b.1   2  FALSE     FALSE        list FALSE    2
24           b/b.1/b.1.1   3  FALSE     FALSE        list FALSE    3
25   b/b.1/b.1.1/b.1.1.1   4  FALSE      TRUE      matrix FALSE  1-1
26   b/b.1/b.1.1/b.1.1.2   4  FALSE      TRUE      matrix FALSE  2-2
27   b/b.1/b.1.1/b.1.1.3   4  FALSE      TRUE      matrix FALSE  1-1
28           b/b.1/b.1.2   3  FALSE     FALSE        list FALSE    3
29   b/b.1/b.1.2/b.1.2.1   4  FALSE      TRUE  data.frame FALSE  1-1
30   b/b.1/b.1.2/b.1.2.2   4  FALSE      TRUE  data.frame FALSE  3-2
31   b/b.1/b.1.2/b.1.2.3   4  FALSE      TRUE  data.frame FALSE  0-0
32                 b/b.2   2  FALSE     FALSE        list FALSE    1
33           b/b.2/b.2.1   3  FALSE     FALSE        list FALSE    4
34   b/b.2/b.2.1/b.2.1.1   4  FALSE     FALSE        list FALSE    1
35 b/b.2/b.2.1/b.2.1.1/a   5  FALSE      TRUE     numeric FALSE    1
36   b/b.2/b.2.1/b.2.1.2   4  FALSE     FALSE        list FALSE    2
37 b/b.2/b.2.1/b.2.1.2/a   5  FALSE      TRUE     integer FALSE    2
38 b/b.2/b.2.1/b.2.1.2/b   5  FALSE      TRUE   character FALSE    2
39   b/b.2/b.2.1/b.2.1.3   4  FALSE     FALSE        list FALSE    5
40 b/b.2/b.2.1/b.2.1.3/a   5  FALSE      TRUE     integer FALSE    2
41 b/b.2/b.2.1/b.2.1.3/b   5  FALSE      TRUE   character FALSE    1
42 b/b.2/b.2.1/b.2.1.3/c   5  FALSE      TRUE     integer FALSE    3
43 b/b.2/b.2.1/b.2.1.3/d   5  FALSE      TRUE     logical FALSE    1
44 b/b.2/b.2.1/b.2.1.3/e   5  FALSE      TRUE     integer FALSE    5
45   b/b.2/b.2.1/b.2.1.4   4  FALSE     FALSE        list FALSE    0
46                     c   1   TRUE     FALSE        list FALSE    1
47                 c/c.1   2  FALSE     FALSE        list FALSE    3
48           c/c.1/c.1.1   3  FALSE      TRUE environment FALSE
49           c/c.1/c.1.2   3  FALSE     FALSE TESTCLASS_X  TRUE    1
50         c/c.1/c.1.2/a   5  FALSE      TRUE     integer FALSE    5
51           c/c.1/c.1.3   3  FALSE     FALSE TESTCLASS_Y  TRUE    1
52         c/c.1/c.1.3/a   4  FALSE      TRUE     integer FALSE    5

> objectIndex(src=src, handle.preserve="s4")
name pos is.top is.bottom       class is.s4  dim
1                       a   1   TRUE     FALSE        list FALSE    5
2                   a/a.1   2  FALSE     FALSE        list FALSE    3
3             a/a.1/a.1.1   3  FALSE      TRUE     numeric FALSE    1
4             a/a.1/a.1.2   3  FALSE      TRUE     integer FALSE    5
5             a/a.1/a.1.3   3  FALSE      TRUE     integer FALSE    0
6                   a/a.2   2  FALSE     FALSE        list FALSE    3
7             a/a.2/a.2.1   3  FALSE      TRUE   character FALSE    1
8             a/a.2/a.2.2   3  FALSE      TRUE   character FALSE    5
9             a/a.2/a.2.3   3  FALSE      TRUE   character FALSE    0
10                  a/a.3   2  FALSE     FALSE        list FALSE    3
11            a/a.3/a.3.1   3  FALSE      TRUE     numeric FALSE    1
12            a/a.3/a.3.2   3  FALSE      TRUE     numeric FALSE    2
13            a/a.3/a.3.3   3  FALSE      TRUE     numeric FALSE    0
14                  a/a.4   2  FALSE     FALSE        list FALSE    3
15            a/a.4/a.4.1   3  FALSE      TRUE     complex FALSE    1
16            a/a.4/a.4.2   3  FALSE      TRUE     complex FALSE    3
17            a/a.4/a.4.3   3  FALSE      TRUE     complex FALSE    0
18                  a/a.5   2  FALSE     FALSE        list FALSE    3
19            a/a.5/a.5.1   3  FALSE      TRUE     logical FALSE    1
20            a/a.5/a.5.2   3  FALSE      TRUE     logical FALSE    3
21            a/a.5/a.5.3   3  FALSE      TRUE     logical FALSE    0
22                      b   1   TRUE     FALSE        list FALSE    2
23                  b/b.1   2  FALSE     FALSE        list FALSE    2
24            b/b.1/b.1.1   3  FALSE     FALSE        list FALSE    3
25    b/b.1/b.1.1/b.1.1.1   4  FALSE      TRUE      matrix FALSE  1-1
26    b/b.1/b.1.1/b.1.1.2   4  FALSE      TRUE      matrix FALSE  2-2
27    b/b.1/b.1.1/b.1.1.3   4  FALSE      TRUE      matrix FALSE  1-1
28            b/b.1/b.1.2   3  FALSE     FALSE        list FALSE    3
29    b/b.1/b.1.2/b.1.2.1   4  FALSE      TRUE  data.frame FALSE  1-1
30 b/b.1/b.1.2/b.1.2.1/X1   5  FALSE      TRUE     numeric FALSE    1
31    b/b.1/b.1.2/b.1.2.2   4  FALSE      TRUE  data.frame FALSE  3-2
32  b/b.1/b.1.2/b.1.2.2/a   5  FALSE      TRUE     integer FALSE    3
33  b/b.1/b.1.2/b.1.2.2/b   5  FALSE      TRUE     integer FALSE    3
34    b/b.1/b.1.2/b.1.2.3   4  FALSE      TRUE  data.frame FALSE  0-0
35                  b/b.2   2  FALSE     FALSE        list FALSE    1
36            b/b.2/b.2.1   3  FALSE     FALSE        list FALSE    4
37    b/b.2/b.2.1/b.2.1.1   4  FALSE     FALSE        list FALSE    1
38  b/b.2/b.2.1/b.2.1.1/a   5  FALSE      TRUE     numeric FALSE    1
39    b/b.2/b.2.1/b.2.1.2   4  FALSE     FALSE        list FALSE    2
40  b/b.2/b.2.1/b.2.1.2/a   5  FALSE      TRUE     integer FALSE    2
41  b/b.2/b.2.1/b.2.1.2/b   5  FALSE      TRUE   character FALSE    2
42    b/b.2/b.2.1/b.2.1.3   4  FALSE     FALSE        list FALSE    5
43  b/b.2/b.2.1/b.2.1.3/a   5  FALSE      TRUE     integer FALSE    2
44  b/b.2/b.2.1/b.2.1.3/b   5  FALSE      TRUE   character FALSE    1
45  b/b.2/b.2.1/b.2.1.3/c   5  FALSE      TRUE     integer FALSE    3
46  b/b.2/b.2.1/b.2.1.3/d   5  FALSE      TRUE     logical FALSE    1
47  b/b.2/b.2.1/b.2.1.3/e   5  FALSE      TRUE     integer FALSE    5
48    b/b.2/b.2.1/b.2.1.4   4  FALSE     FALSE        list FALSE    0
49                      c   1   TRUE     FALSE        list FALSE    1
50                  c/c.1   2  FALSE     FALSE        list FALSE    3
51            c/c.1/c.1.1   3  FALSE      TRUE environment FALSE
52            c/c.1/c.1.2   3  FALSE      TRUE TESTCLASS_X  TRUE    1
53            c/c.1/c.1.3   3  FALSE      TRUE TESTCLASS_Y  TRUE    1

> objectIndex(src=src, handle.preserve=c("data.frame", "s4"))
name pos is.top is.bottom       class is.s4  dim
1                      a   1   TRUE     FALSE        list FALSE    5
2                  a/a.1   2  FALSE     FALSE        list FALSE    3
3            a/a.1/a.1.1   3  FALSE      TRUE     numeric FALSE    1
4            a/a.1/a.1.2   3  FALSE      TRUE     integer FALSE    5
5            a/a.1/a.1.3   3  FALSE      TRUE     integer FALSE    0
6                  a/a.2   2  FALSE     FALSE        list FALSE    3
7            a/a.2/a.2.1   3  FALSE      TRUE   character FALSE    1
8            a/a.2/a.2.2   3  FALSE      TRUE   character FALSE    5
9            a/a.2/a.2.3   3  FALSE      TRUE   character FALSE    0
10                 a/a.3   2  FALSE     FALSE        list FALSE    3
11           a/a.3/a.3.1   3  FALSE      TRUE     numeric FALSE    1
12           a/a.3/a.3.2   3  FALSE      TRUE     numeric FALSE    2
13           a/a.3/a.3.3   3  FALSE      TRUE     numeric FALSE    0
14                 a/a.4   2  FALSE     FALSE        list FALSE    3
15           a/a.4/a.4.1   3  FALSE      TRUE     complex FALSE    1
16           a/a.4/a.4.2   3  FALSE      TRUE     complex FALSE    3
17           a/a.4/a.4.3   3  FALSE      TRUE     complex FALSE    0
18                 a/a.5   2  FALSE     FALSE        list FALSE    3
19           a/a.5/a.5.1   3  FALSE      TRUE     logical FALSE    1
20           a/a.5/a.5.2   3  FALSE      TRUE     logical FALSE    3
21           a/a.5/a.5.3   3  FALSE      TRUE     logical FALSE    0
22                     b   1   TRUE     FALSE        list FALSE    2
23                 b/b.1   2  FALSE     FALSE        list FALSE    2
24           b/b.1/b.1.1   3  FALSE     FALSE        list FALSE    3
25   b/b.1/b.1.1/b.1.1.1   4  FALSE      TRUE      matrix FALSE  1-1
26   b/b.1/b.1.1/b.1.1.2   4  FALSE      TRUE      matrix FALSE  2-2
27   b/b.1/b.1.1/b.1.1.3   4  FALSE      TRUE      matrix FALSE  1-1
28           b/b.1/b.1.2   3  FALSE     FALSE        list FALSE    3
29   b/b.1/b.1.2/b.1.2.1   4  FALSE      TRUE  data.frame FALSE  1-1
30   b/b.1/b.1.2/b.1.2.2   4  FALSE      TRUE  data.frame FALSE  3-2
31   b/b.1/b.1.2/b.1.2.3   4  FALSE      TRUE  data.frame FALSE  0-0
32                 b/b.2   2  FALSE     FALSE        list FALSE    1
33           b/b.2/b.2.1   3  FALSE     FALSE        list FALSE    4
34   b/b.2/b.2.1/b.2.1.1   4  FALSE     FALSE        list FALSE    1
35 b/b.2/b.2.1/b.2.1.1/a   5  FALSE      TRUE     numeric FALSE    1
36   b/b.2/b.2.1/b.2.1.2   4  FALSE     FALSE        list FALSE    2
37 b/b.2/b.2.1/b.2.1.2/a   5  FALSE      TRUE     integer FALSE    2
38 b/b.2/b.2.1/b.2.1.2/b   5  FALSE      TRUE   character FALSE    2
39   b/b.2/b.2.1/b.2.1.3   4  FALSE     FALSE        list FALSE    5
40 b/b.2/b.2.1/b.2.1.3/a   5  FALSE      TRUE     integer FALSE    2
41 b/b.2/b.2.1/b.2.1.3/b   5  FALSE      TRUE   character FALSE    1
42 b/b.2/b.2.1/b.2.1.3/c   5  FALSE      TRUE     integer FALSE    3
43 b/b.2/b.2.1/b.2.1.3/d   5  FALSE      TRUE     logical FALSE    1
44 b/b.2/b.2.1/b.2.1.3/e   5  FALSE      TRUE     integer FALSE    5
45   b/b.2/b.2.1/b.2.1.4   4  FALSE     FALSE        list FALSE    0
46                     c   1   TRUE     FALSE        list FALSE    1
47                 c/c.1   2  FALSE     FALSE        list FALSE    3
48           c/c.1/c.1.1   3  FALSE      TRUE environment FALSE
49           c/c.1/c.1.2   3  FALSE      TRUE TESTCLASS_X  TRUE    1
50           c/c.1/c.1.3   3  FALSE      TRUE TESTCLASS_Y  TRUE    1

Benchmarking


> require(microbenchmark)
> res <- microbenchmark(objectIndex(src=src))
median(res$time/1000000000)
> [1] 0.3378009

Here’s the code:

Generics


setGeneric(
name="argProcess",
signature=c("arg"),
def=function(
arg,
...
){
standardGeneric("argProcess")
}
)

setGeneric(
name="gregexpr2",
signature=c("src"),
def=function(
src,
...
){
standardGeneric("gregexpr2")
}
)

setGeneric(
name="objectIndex",
signature=c("src"),
def=function(
src,
...
){
standardGeneric("objectIndex")
}
)

Methods


setMethod(
f="argProcess",
signature=signature(arg="character"),
definition=function(
arg,
value.valid=NULL,
idx.restrict=NULL,
do.remove.null=FALSE,
...
){

#---------------------------------------------------------------------------
# ACTUAL PROCESSING
#---------------------------------------------------------------------------

if(!is.null(value.valid)){
subsetValidate(src=arg, tgt=value.valid,
.ARGS=.ARGS)
}

out <- arg
#    if(length(arg) > 1){
#        out <- arg[1]
#    }
if(!is.null(idx.restrict)){
if(!is.numeric(idx.restrict)){
stop(paste("/invalid restriction index", sep=""))
}
if(length(idx.restrict) > length(out)){
stop(paste("/invalid restriction index length", sep=""))
}
if(idx.restrict == 0){
if(length(out) > 1){
msg <- c(
paste("/invalid argument value", sep=""), "\n",
paste("* Value: '", paste(out, collapse=", "), "'", sep=""), "\n",
paste("* Needs to be singular", sep="")
)
stop(msg)
}
} else {
out <- out[idx.restrict]
}
}

# REMOVE NULL
if(do.remove.null){
idx.null <- which(is.null(out))
if(length(idx.null)){
out <- out[-idx.null]
}
}
# /

# /ACTUAL PROCESSING ----------

#---------------------------------------------------------------------------
# FINALIZE
#---------------------------------------------------------------------------

return(out)

# /FINALIZE ----------

}
)

setMethod(
f="gregexpr2",
signature=signature(src="character"),
definition=function(
src,
pattern,
do.ignore.case=FALSE,
do.perl=TRUE,
do.fixed=FALSE,
do.useBytes=FALSE,
do.simplify=FALSE,
...
){

#---------------------------------------------------------------------------
# ACTUAL PROCESSING
#---------------------------------------------------------------------------

if(all(is.na(src))){
out <- lapply(1:length(src), function(x){
out <- data.frame(start=NA, stop=NA)
})
} else {
idx.regex <- gregexpr(
text=src,
pattern=pattern,
ignore.case=do.ignore.case,
perl=do.perl,
fixed=do.fixed,
useBytes=do.useBytes
)
out <- lapply(idx.regex, function(x){
if(any(x == -1)){
out <- data.frame(start=NA, stop=NA)
} else {
out <- data.frame(start=x, stop=x+(attributes(x)$match.length-1))
}
return(out)
})
}

if(do.simplify){
if(length(out) == 1){
out <- out[[1]]
}
}

# /ACTUAL PROCESSING ----------

#---------------------------------------------------------------------------
# FINALIZE
#---------------------------------------------------------------------------

return(out)

# /FINALIZE ----------

}
)

setMethod(
f="objectIndex",
signature=signature(src="list"),
definition=function(
src,
handle.preserve=c("standard", "all", "no", "data.frame", "s4"),
.delim.path="/",
...
){

#---------------------------------------------------------------------------
# GENERAL PREPROCESSING
#---------------------------------------------------------------------------

handle.preserve <- argProcess(
arg=handle.preserve,
value.valid=c("standard", "all", "no", "data.frame", "s4")
)
if("standard" %in% handle.preserve){
handle.preserve <- "standard"
}

# /GENERAL PREPROCESSING ----------

#---------------------------------------------------------------------------
# PREPROCESSING
#---------------------------------------------------------------------------

# TRANSFORM IF FLAT
if(all(length(grep(.delim.path, names(src))>0))){
# This is assuming that src is already an flattened list.
cat(paste("/preprocessing 'src' (deepening) ...", sep=""), sep="\n")
src <- objectDeepen(src=src, .ARGS=.ARGS)
}
# /

pattern <- "(\\.\\.)"

# RETRIEVE STRUCTURE
temp <- capture.output(str(src))
temp <- temp[-1]
# /

# DROP ATTRIBUTES
idx <- grep("\\.- attr\\(", temp)
idx.drop <- unlist(lapply(idx, function(x.idx){
temp.attr <- temp[x.idx]
idx.1 <- gregexpr2(
src=temp.attr,
pattern="(\\[.*\\])|([[:digit:]]+\\s*obs|of|with).*[[:digit:]]+",
do.simplify=TRUE
)
out <- NULL
if(all(!is.na(unlist(idx.1)))){
out <- as.numeric(
gsub("[[:alpha:]]|[[:punct:]]|\\s", "", substr(temp.attr, start=idx.1$start, stop=idx.1$stop))
)
out <- x.idx:(x.idx+out)
}
}))
if(length(idx.drop)){
temp <- temp[-idx.drop]
}
#temp
# /

# DROP METHODS
idx.drop <- grep("\\..*[[:digit:]]+\\s*methods", temp)
if(length(idx.drop)){
temp <- temp[-idx.drop]
}
#temp
# /

# /PREPROCESSING ----------

#---------------------------------------------------------------------------
# CLASS + DIMENSION
#---------------------------------------------------------------------------

# GET INDEX
idx <- gregexpr2(
src=temp,
pattern=":.*$"
)
# /

# CLASS AND DIM INFO
class.info <- lapply(seq(along=idx), function(x.idx){
out.0 <- substr(temp[x.idx], start=idx[[x.idx]]$start, stop=idx[[x.idx]]$stop)
if(is.na(out.0)){
return(NULL)
}
# DIMENSION / LENGTH
out.dim <- 1
# Check for zero length
idx.1 <- gregexpr2(
src=out.0,
pattern="^:\\s*\\w+\\(.*\\)",
do.simplify=TRUE
)
if(!all(is.na(unlist(idx.1)))){
out.dim <- 0
idx.1   <- NA
} else {
# Check for non-zero length
idx.1 <- gregexpr2(
src=out.0,
#                    pattern="\\[[.*\\]]",
pattern="(\\[\\d+,?.*\\])|(:\t[[:digit:]]|(of|with).*[[:digit:]]+)",
do.simplify=TRUE
)
}
#print(idx.1)
#        out.dim <- NA
if(all(!is.na(unlist(idx.1)))){
#x.idx=1
out.dim <- lapply(1:nrow(idx.1), function(x.idx){
out.dim <- substr(out.0, start=idx.1$start[x.idx],
stop=idx.1$stop[x.idx])
out.dim <- unlist(strsplit(out.dim, split=", "))
out.dim <- lapply(out.dim, function(x.dim){
out.dim <- x.dim
idx.1 <- gregexpr2(
src=out.dim,
pattern="[[:digit:]]+",
do.simplify=TRUE
)
out.dim <- sapply(1:nrow(idx.1), function(x.row){
substr(out.dim, start=idx.1$start[x.row], stop=idx.1$stop[x.row])
})
})
if(length(out.dim) == 1){
out.dim <- out.dim[[1]]
}
out.dim
})
}
#print(out.dim)
# /
# CLASS
idx.s4 <- FALSE

idx.1 <- gregexpr2(
src=out.0,
pattern="^:\\s*<?\\w+:?|('.*')",
do.simplify=TRUE
)
out.class <- NA
if(all(!is.na(idx.1$start))){
out.class <- sapply(1:nrow(idx.1), function(x.row){
out <- substr(out.0, start=idx.1$start[x.row], stop=idx.1$stop[x.row])
out <- gsub("^[[:punct:]]*|[[:punct:]]*$|\\s*", "", out)
})
}
if(class(out.dim[[1]]) == "list" & any(out.class != "data.frame")){
out.class <- "matrix"
} else {
out.dim <- unlist(out.dim)
}

out.class <- switch(
out.class[1],
"chr"="character",
"cplx"="complex",
"data.frame"={
out.dim <- paste(out.dim, collapse="-")
"data.frame"
},
"environment"={
out.dim <- NA
"environment"
},
"Formal"={
idx.s4 <- TRUE
out.class[2]
},
"int"="integer",
"list"="list",
"List"="list",
"logi"="logical",
"matrix"={
out.dim <- sapply(out.dim[[1]], function(x){
out.dim <- x
if(length(out.dim) > 1){
out.dim <- out.dim[2]
}
out.dim
})
out.dim <- paste(out.dim, collapse="-")
"matrix"
},
"num"="numeric",
"Reference"={
idx.s4 <- TRUE
out.class[2]
}
)
# /
if(length(out.dim) > 1){
out.dim <- out.dim[2]
}
out <- data.frame(class=out.class, is.s4=idx.s4,
dim=out.dim, stringsAsFactors=FALSE)
return(out)
})
class.info <- do.call("rbind", class.info)
# /

idx.na <- which(is.na(class.info$class))
if(length(idx.na)){
class.info <- class.info[-idx.na,]
}

# /CLASS + DIMENSION ----------

#---------------------------------------------------------------------------
# MAIN PROCESSING
#---------------------------------------------------------------------------

temp <- gsub(":.*$", "", temp)
temp <- gsub("\\s*\\$\\s*", "", temp)
temp <- gsub("^\\s*", "", temp)
temp <- gsub("@", "", temp)
struc.names <- gsub("\\s*$", "", temp)
#struc.names
# /

# TRANSFORM STRUCTURE
idx.regex <- gregexpr2(
src=struc.names,
pattern=pattern
)
struc.names <- gsub(paste(pattern, "|\\s*", sep=""), "", struc.names)
struc.pos   <- sapply(idx.regex, function(x){
out <- 0
if(all(!is.na(x$start))){
out <- nrow(x)
}
return(out)
})

# STOP ON EMPTY NAMES
idx.empty <- which(struc.names == "")
if(length(idx.empty)){
msg <- c(
paste("/names error:", sep=""), "\n",
paste("* Names: {", paste(struc.names, collapse=", "), "}", sep="")
)
stop(msg)
}
# /

# STRUCTURE
pos         <- -1
path.0      <- NULL
path.1      <- NULL
is.top      <- NULL
idx.zero <- which(struc.pos == 0)
for(x in 1:length(struc.pos)){
res         <- struc.pos[x]
pos.new     <- list(res)
path.0.new  <- struc.names[x]
if(struc.pos[x] > pos){
pos <- struc.pos[x]
} else {
# RESET
if(struc.pos[x] < pos){
pos <- struc.pos[x]
}
if(pos == 0){
path.0 <- NULL
} else {
path.0 <- path.0[1:pos]
}
# /
pos.new <- list(0:res)

}
path.0  <- c(path.0, path.0.new)

path.1.new  <- paste(path.0, collapse=.delim.path)
path.1  <- c(path.1, path.1.new)

is.top.new  <- length(path.0) == 1
is.top      <- c(is.top, is.top.new)
}

# STRUCTURE DF
struc.df <- data.frame(
name=path.1,
pos=struc.pos+1,
is.top=is.top,
is.bottom=FALSE,
class.info,
stringsAsFactors=FALSE
)
# /

# BOTTOM BRANCHES
classes.pres <- "list"
idx.bottom <- which(!(struc.df$class %in% classes.pres) & !struc.df$is.s4)
if(length(idx.bottom)){
struc.df$is.bottom[idx.bottom] <- TRUE
}
# /

# HANDLE PRESERVE
if(!("no" %in% handle.preserve)){
if(any(c("standard", "data.frame") %in% handle.preserve)){
idx.pres <- which(struc.df$class == "data.frame")
if(length(idx.pres)){
struc.df$is.bottom[idx.pres] <- TRUE
obj.dim <- sapply(strsplit(struc.df$dim[idx.pres], split="-"), function(x){
as.numeric(x[2])
})
idx.drop    <- which(obj.dim == 0)
if(length(idx.drop)){
obj.dim     <- obj.dim[-idx.drop]
idx.pres    <- idx.pres[-idx.drop]
}
idx.temp    <- data.frame(start=idx.pres+1, stop=idx.pres+obj.dim)
idx.drop    <- unlist(lapply(1:nrow(idx.temp), function(x.row){
idx.temp$start[x.row]:idx.temp$stop[x.row]
}))
if(length(idx.drop)){
struc.df <- struc.df[-idx.drop,]
rownames(struc.df) <- NULL
}
}
}
if(any(c("standard", "s4") %in% handle.preserve)){
idx.pres <- which(struc.df$is.s4)
if(length(idx.pres)){
struc.df$is.bottom[idx.pres] <- TRUE
obj.dim     <- as.numeric(struc.df$dim[idx.pres])
idx.temp    <- data.frame(start=idx.pres+1, stop=idx.pres+obj.dim)
idx.drop    <- unlist(lapply(1:nrow(idx.temp), function(x.row){
idx.temp$start[x.row]:idx.temp$stop[x.row]
}))
idx.keep    <- which(idx.drop %in% idx.pres)
if(length(idx.keep)){
idx.drop <- idx.drop[-idx.keep]
}
if(length(idx.drop)){
struc.df <- struc.df[-idx.drop,]
rownames(struc.df) <- NULL
}
}
}
if(any("all" %in% handle.preserve)){
idx.pres <- which(struc.df$class == "list")
if(length(idx.pres)){
classes.pres <- c("data.frame", "environment", "list", "matrix")
idx.drop <- which(
struc.df$is.bottom &
!(struc.df$class %in% classes.pres) &
!(struc.df$is.s4)
)
idx.change <- idx.pres + 1
idx <- which(idx.change %in% idx.drop)
if(length(idx)){
idx.change <- idx.change[idx] - 1
}
struc.df$is.bottom[idx.change] <- TRUE
if(length(idx.drop)){
struc.df <- struc.df[-idx.drop,]
rownames(struc.df) <- NULL
}
}
}
}
# /

out <- struc.df

# /MAIN PROCESSING ----------

#---------------------------------------------------------------------------
# FINALIZE
#---------------------------------------------------------------------------

return(out)

# /FINALIZE ----------

}
)

subsetValidate <- function(
src,
tgt,
do.index=FALSE,
index.type=c("logical", "numeric", "logical.all", "logical.any"),
do.strict=TRUE,
do.warning=TRUE,
do.allow.nomatch=FALSE,
value.nomatch=NULL,
...
){
#---------------------------------------------------------------------------
# GENERAL PREPROCESSING
#---------------------------------------------------------------------------

if(missing(src) | missing(tgt)){
stop("/missing arguments 'src' and/or 'tgt'", sep="")
}
# VALIDATING AND PROCESSING SOURCE AND TARGET
do.special.src <- FALSE
if(!is.character(src)){
if(.do.verbose | .do.debug){
cat(paste("/non-character 'src'. Processing ...",
sep=""), sep="\n")
}
if(!is.na(src)){
if(is.null(names(src))){
stop(paste("/'src' has no names. Processing failed.",
sep=""), sep="\n")
}
}
do.special.src <- TRUE
src.0   <- src
src     <- names(src)
}
do.special.tgt <- FALSE
if(!is.character(tgt)){
if(.do.verbose | .do.debug){
cat(paste("/non-character 'tgt'. Processing ...",
sep=""), sep="\n")
}
if(!is.na(tgt)){
if(is.null(names(tgt))){
stop(paste("/'tgt' has no names. Processing failed.",
sep=""), sep="\n")
}
}
do.special.tgt <- TRUE
tgt.0   <- tgt
tgt     <- names(tgt)
}
# /
#    if(!is.character(src) | !is.character(tgt)){
#        stop(paste(
#            "/expecting args 'src' and 'tgt' to be of class 'character'",
#            sep=""))
#    }
index.type.valid <- c("logical", "numeric", "logical.all", "logical.any")
if(!all(index.type %in% index.type.valid)){
msg <- c(
paste("/arg 'index.type' is invalid:", sep=""),
"\n",
paste("* Value: ", paste(deparse(index.type), collapse=""), sep=""),
"\n",
paste("* Valid: ", paste(deparse(index.type.valid), collapse=""), sep="")
)
stop(msg)
}
if(do.index & length(index.type) > 1){
index.type <- index.type[1]
}

# /GENERAL PREPROCESSING ----------

#---------------------------------------------------------------------------
# ACTUAL PROCESSING
#---------------------------------------------------------------------------

idx.0 <- src %in% tgt
idx.1 <- which(idx.0)

if(do.index){
if(.do.verbose | .do.debug){
cat(paste("/swiching arg 'do.strict = FALSE", sep=""), sep="\n")
cat(paste("/swiching arg 'do.warning = FALSE", sep=""), sep="\n")
}
do.strict   <- FALSE
do.warning  <- FALSE
if(index.type == "logical"){
out.idx <- idx.0
names(out.idx) <- src
}
if(index.type == "numeric"){
out.idx <- idx.1
names(out.idx) <- src[out.idx]
}
if(index.type == "logical.all"){
out.idx <- all(idx.0)
}
if(index.type == "logical.any"){
out.idx <- any(idx.0)
}
}

# NO MATCH
if(!length(idx.1)){
if(!do.index){
if(do.allow.nomatch){
# LOCAL TERMINATE

return(value.nomatch)
} else {
msg <- c(
paste("/non-matching elements in 'src' (all):", sep=""),
"\n",
paste("* '", src, "'", sep="")
)
stop(msg)
}
} else {
if(index.type == "numeric"){
out.idx <- numeric(0)
}
return(any(idx.0))
}
}
# /
# PARTLY MATCH
if(!all(idx.0)){
msg <- c(
paste("/partly non-matching elements in 'src':", sep=""),
"\n",
paste("* '", src[which(!idx.0)], "'", sep="")
)
if(do.strict){
if(do.allow.nomatch){
# LOCAL TERMINATE

return(value.nomatch)
} else {
stop(msg)
}
} else {
if(do.warning){
warning(msg)
}
}
}
# /
if(!do.index){
if(do.special.src){
if(is.data.frame(src.0) | is.matrix(src.0)){
tgt <- src.0[,idx.1]
} else {
tgt <- src.0[idx.1]
}
} else {
tgt <- src[idx.1]
}
out <- tgt
} else {
out <- out.idx
}
# /

#---------------------------------------------------------------------------
# FINALIZE
#---------------------------------------------------------------------------

return(out)

# /FINALIZE ----------
}


To leave a comment for the author, please follow the link and comment on their blog: [R]appster.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.