Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’ve long searched for a somewhat efficient approach to indexing nested lists and/or environments and here’s my best solution so far.
For me, being able to compute such an index is the crucial part in order to actually manage such nested structures (which are very helpful in a lot of scenarios where formal classes are too inflexible). What you need to master are a) being able to select a specific branch in the nested list and b) being able to update it (e.g. adding new branches). A use case would be parsing a config file format to a nested list structure.
Initially, I tried to get the job done by recursively traversing the branch structure, but that turned out to be really messy and inefficient.
Basically, the current approach is all about mimicing what ‘str()’ does and putting this into a data frame structure:
x str(x) List of 1 $ a:List of 2 ..$ a.1: int [1:3] 1 2 3 ..$ a.2: logi TRUE
At this point, I’ll simply present you what the result of indexing such nested structure looks like. As I don’t have a Github repository set up yet, I’ll put the source code at the very end. Of course, there’s a lot of overhead since it’s based on a lot of regular expression stuff. If you have any suggestions for improvement, you’re more than welcome to tell me!
I’m very sorry for the messed up indentation, but I’ve tried twice. WordPress keeps messing it up somehow. Suggestions on how to avoid this greatly appreciated.
Example list
setClass("TESTCLASS_X", representation=representation(a="numeric")) setRefClass("TESTCLASS_Y", fields=list(a="numeric")) src a=list( a.1=list(a.1.1 = 1, a.1.2 = 1:5, a.1.3 = integer(0)), a.2=list(a.2.1="a", a.2.2=c("a", "b", "c", "d", "e"), a.2.3=character()), a.3=list(a.3.1=1.5, a.3.2=c(1.5, 2.5), a.3.3=numeric()), a.4=list(a.4.1=1+1i, a.4.2=c(0.1+0.3i, 0.2+0.2i, 0.1+0.1i), a.4.3=complex()), a.5=list(a.5.1 = TRUE, a.5.2 = c(TRUE, TRUE, TRUE), a.5.3 = logical()) ), b=list( b.1=list( b.1.1=list(b.1.1.1 =matrix(1, ncol=1), b.1.1.2=matrix(c(1:4), ncol=2, dimnames=list(NULL, c("a","b"))), b.1.1.3=matrix() ), b.1.2=list(b.1.2.1=data.frame(1), b.1.2.2=data.frame(a=1:3, b=1:3), b.1.2.3=data.frame() ) ), b.2=list( b.2.1=list(b.2.1.1=list(a=1), b.2.1.2=list(a=1:2, b=letters[1:2]), b.2.1.3=list(a=1:2, b="a", c=1:3, d=FALSE, e=1:5), b.2.1.4=list() ) ) ), c=list( c.1=list(c.1.1=new.env(), c.1.2=new("TESTCLASS_X", a=1:5), c.1.3=new("TESTCLASS_Y", a=1:5)) ) )
After indexing
> objectIndex(src=src) name pos is.top is.bottom class is.s4 dim 1 a 1 TRUE FALSE list FALSE 5 2 a/a.1 2 FALSE FALSE list FALSE 3 3 a/a.1/a.1.1 3 FALSE TRUE numeric FALSE 1 4 a/a.1/a.1.2 3 FALSE TRUE integer FALSE 5 5 a/a.1/a.1.3 3 FALSE TRUE integer FALSE 0 6 a/a.2 2 FALSE FALSE list FALSE 3 7 a/a.2/a.2.1 3 FALSE TRUE character FALSE 1 8 a/a.2/a.2.2 3 FALSE TRUE character FALSE 5 9 a/a.2/a.2.3 3 FALSE TRUE character FALSE 0 10 a/a.3 2 FALSE FALSE list FALSE 3 11 a/a.3/a.3.1 3 FALSE TRUE numeric FALSE 1 12 a/a.3/a.3.2 3 FALSE TRUE numeric FALSE 2 13 a/a.3/a.3.3 3 FALSE TRUE numeric FALSE 0 14 a/a.4 2 FALSE FALSE list FALSE 3 15 a/a.4/a.4.1 3 FALSE TRUE complex FALSE 1 16 a/a.4/a.4.2 3 FALSE TRUE complex FALSE 3 17 a/a.4/a.4.3 3 FALSE TRUE complex FALSE 0 18 a/a.5 2 FALSE FALSE list FALSE 3 19 a/a.5/a.5.1 3 FALSE TRUE logical FALSE 1 20 a/a.5/a.5.2 3 FALSE TRUE logical FALSE 3 21 a/a.5/a.5.3 3 FALSE TRUE logical FALSE 0 22 b 1 TRUE FALSE list FALSE 2 23 b/b.1 2 FALSE FALSE list FALSE 2 24 b/b.1/b.1.1 3 FALSE FALSE list FALSE 3 25 b/b.1/b.1.1/b.1.1.1 4 FALSE TRUE matrix FALSE 1-1 26 b/b.1/b.1.1/b.1.1.2 4 FALSE TRUE matrix FALSE 2-2 27 b/b.1/b.1.1/b.1.1.3 4 FALSE TRUE matrix FALSE 1-1 28 b/b.1/b.1.2 3 FALSE FALSE list FALSE 3 29 b/b.1/b.1.2/b.1.2.1 4 FALSE TRUE data.frame FALSE 1-1 30 b/b.1/b.1.2/b.1.2.2 4 FALSE TRUE data.frame FALSE 3-2 31 b/b.1/b.1.2/b.1.2.3 4 FALSE TRUE data.frame FALSE 0-0 32 b/b.2 2 FALSE FALSE list FALSE 1 33 b/b.2/b.2.1 3 FALSE FALSE list FALSE 4 34 b/b.2/b.2.1/b.2.1.1 4 FALSE FALSE list FALSE 1 35 b/b.2/b.2.1/b.2.1.1/a 5 FALSE TRUE numeric FALSE 1 36 b/b.2/b.2.1/b.2.1.2 4 FALSE FALSE list FALSE 2 37 b/b.2/b.2.1/b.2.1.2/a 5 FALSE TRUE integer FALSE 2 38 b/b.2/b.2.1/b.2.1.2/b 5 FALSE TRUE character FALSE 2 39 b/b.2/b.2.1/b.2.1.3 4 FALSE FALSE list FALSE 5 40 b/b.2/b.2.1/b.2.1.3/a 5 FALSE TRUE integer FALSE 2 41 b/b.2/b.2.1/b.2.1.3/b 5 FALSE TRUE character FALSE 1 42 b/b.2/b.2.1/b.2.1.3/c 5 FALSE TRUE integer FALSE 3 43 b/b.2/b.2.1/b.2.1.3/d 5 FALSE TRUE logical FALSE 1 44 b/b.2/b.2.1/b.2.1.3/e 5 FALSE TRUE integer FALSE 5 45 b/b.2/b.2.1/b.2.1.4 4 FALSE FALSE list FALSE 0 46 c 1 TRUE FALSE list FALSE 1 47 c/c.1 2 FALSE FALSE list FALSE 3 48 c/c.1/c.1.1 3 FALSE TRUE environment FALSE 49 c/c.1/c.1.2 3 FALSE TRUE TESTCLASS_X TRUE 1 50 c/c.1/c.1.3 3 FALSE TRUE TESTCLASS_Y TRUE 1 > objectIndex(src=src, handle.preserve="all") name pos is.top is.bottom class is.s4 dim 1 a 1 TRUE FALSE list FALSE 5 2 a/a.1 2 FALSE TRUE list FALSE 3 3 a/a.2 2 FALSE TRUE list FALSE 3 4 a/a.3 2 FALSE TRUE list FALSE 3 5 a/a.4 2 FALSE TRUE list FALSE 3 6 a/a.5 2 FALSE TRUE list FALSE 3 7 b 1 TRUE FALSE list FALSE 2 8 b/b.1 2 FALSE FALSE list FALSE 2 9 b/b.1/b.1.1 3 FALSE FALSE list FALSE 3 10 b/b.1/b.1.1/b.1.1.1 4 FALSE TRUE matrix FALSE 1-1 11 b/b.1/b.1.1/b.1.1.2 4 FALSE TRUE matrix FALSE 2-2 12 b/b.1/b.1.1/b.1.1.3 4 FALSE TRUE matrix FALSE 1-1 13 b/b.1/b.1.2 3 FALSE FALSE list FALSE 3 14 b/b.1/b.1.2/b.1.2.1 4 FALSE TRUE data.frame FALSE 1-1 15 b/b.1/b.1.2/b.1.2.2 4 FALSE TRUE data.frame FALSE 3-2 16 b/b.1/b.1.2/b.1.2.3 4 FALSE TRUE data.frame FALSE 0-0 17 b/b.2 2 FALSE FALSE list FALSE 1 18 b/b.2/b.2.1 3 FALSE FALSE list FALSE 4 19 b/b.2/b.2.1/b.2.1.1 4 FALSE TRUE list FALSE 1 20 b/b.2/b.2.1/b.2.1.2 4 FALSE TRUE list FALSE 2 21 b/b.2/b.2.1/b.2.1.3 4 FALSE TRUE list FALSE 5 22 b/b.2/b.2.1/b.2.1.4 4 FALSE FALSE list FALSE 0 23 c 1 TRUE FALSE list FALSE 1 24 c/c.1 2 FALSE FALSE list FALSE 3 25 c/c.1/c.1.1 3 FALSE TRUE environment FALSE 26 c/c.1/c.1.2 3 FALSE FALSE TESTCLASS_X TRUE 1 27 c/c.1/c.1.3 3 FALSE FALSE TESTCLASS_Y TRUE 1 > objectIndex(src=src, handle.preserve="data.frame") name pos is.top is.bottom class is.s4 dim 1 a 1 TRUE FALSE list FALSE 5 2 a/a.1 2 FALSE FALSE list FALSE 3 3 a/a.1/a.1.1 3 FALSE TRUE numeric FALSE 1 4 a/a.1/a.1.2 3 FALSE TRUE integer FALSE 5 5 a/a.1/a.1.3 3 FALSE TRUE integer FALSE 0 6 a/a.2 2 FALSE FALSE list FALSE 3 7 a/a.2/a.2.1 3 FALSE TRUE character FALSE 1 8 a/a.2/a.2.2 3 FALSE TRUE character FALSE 5 9 a/a.2/a.2.3 3 FALSE TRUE character FALSE 0 10 a/a.3 2 FALSE FALSE list FALSE 3 11 a/a.3/a.3.1 3 FALSE TRUE numeric FALSE 1 12 a/a.3/a.3.2 3 FALSE TRUE numeric FALSE 2 13 a/a.3/a.3.3 3 FALSE TRUE numeric FALSE 0 14 a/a.4 2 FALSE FALSE list FALSE 3 15 a/a.4/a.4.1 3 FALSE TRUE complex FALSE 1 16 a/a.4/a.4.2 3 FALSE TRUE complex FALSE 3 17 a/a.4/a.4.3 3 FALSE TRUE complex FALSE 0 18 a/a.5 2 FALSE FALSE list FALSE 3 19 a/a.5/a.5.1 3 FALSE TRUE logical FALSE 1 20 a/a.5/a.5.2 3 FALSE TRUE logical FALSE 3 21 a/a.5/a.5.3 3 FALSE TRUE logical FALSE 0 22 b 1 TRUE FALSE list FALSE 2 23 b/b.1 2 FALSE FALSE list FALSE 2 24 b/b.1/b.1.1 3 FALSE FALSE list FALSE 3 25 b/b.1/b.1.1/b.1.1.1 4 FALSE TRUE matrix FALSE 1-1 26 b/b.1/b.1.1/b.1.1.2 4 FALSE TRUE matrix FALSE 2-2 27 b/b.1/b.1.1/b.1.1.3 4 FALSE TRUE matrix FALSE 1-1 28 b/b.1/b.1.2 3 FALSE FALSE list FALSE 3 29 b/b.1/b.1.2/b.1.2.1 4 FALSE TRUE data.frame FALSE 1-1 30 b/b.1/b.1.2/b.1.2.2 4 FALSE TRUE data.frame FALSE 3-2 31 b/b.1/b.1.2/b.1.2.3 4 FALSE TRUE data.frame FALSE 0-0 32 b/b.2 2 FALSE FALSE list FALSE 1 33 b/b.2/b.2.1 3 FALSE FALSE list FALSE 4 34 b/b.2/b.2.1/b.2.1.1 4 FALSE FALSE list FALSE 1 35 b/b.2/b.2.1/b.2.1.1/a 5 FALSE TRUE numeric FALSE 1 36 b/b.2/b.2.1/b.2.1.2 4 FALSE FALSE list FALSE 2 37 b/b.2/b.2.1/b.2.1.2/a 5 FALSE TRUE integer FALSE 2 38 b/b.2/b.2.1/b.2.1.2/b 5 FALSE TRUE character FALSE 2 39 b/b.2/b.2.1/b.2.1.3 4 FALSE FALSE list FALSE 5 40 b/b.2/b.2.1/b.2.1.3/a 5 FALSE TRUE integer FALSE 2 41 b/b.2/b.2.1/b.2.1.3/b 5 FALSE TRUE character FALSE 1 42 b/b.2/b.2.1/b.2.1.3/c 5 FALSE TRUE integer FALSE 3 43 b/b.2/b.2.1/b.2.1.3/d 5 FALSE TRUE logical FALSE 1 44 b/b.2/b.2.1/b.2.1.3/e 5 FALSE TRUE integer FALSE 5 45 b/b.2/b.2.1/b.2.1.4 4 FALSE FALSE list FALSE 0 46 c 1 TRUE FALSE list FALSE 1 47 c/c.1 2 FALSE FALSE list FALSE 3 48 c/c.1/c.1.1 3 FALSE TRUE environment FALSE 49 c/c.1/c.1.2 3 FALSE FALSE TESTCLASS_X TRUE 1 50 c/c.1/c.1.2/a 5 FALSE TRUE integer FALSE 5 51 c/c.1/c.1.3 3 FALSE FALSE TESTCLASS_Y TRUE 1 52 c/c.1/c.1.3/a 4 FALSE TRUE integer FALSE 5 > objectIndex(src=src, handle.preserve="s4") name pos is.top is.bottom class is.s4 dim 1 a 1 TRUE FALSE list FALSE 5 2 a/a.1 2 FALSE FALSE list FALSE 3 3 a/a.1/a.1.1 3 FALSE TRUE numeric FALSE 1 4 a/a.1/a.1.2 3 FALSE TRUE integer FALSE 5 5 a/a.1/a.1.3 3 FALSE TRUE integer FALSE 0 6 a/a.2 2 FALSE FALSE list FALSE 3 7 a/a.2/a.2.1 3 FALSE TRUE character FALSE 1 8 a/a.2/a.2.2 3 FALSE TRUE character FALSE 5 9 a/a.2/a.2.3 3 FALSE TRUE character FALSE 0 10 a/a.3 2 FALSE FALSE list FALSE 3 11 a/a.3/a.3.1 3 FALSE TRUE numeric FALSE 1 12 a/a.3/a.3.2 3 FALSE TRUE numeric FALSE 2 13 a/a.3/a.3.3 3 FALSE TRUE numeric FALSE 0 14 a/a.4 2 FALSE FALSE list FALSE 3 15 a/a.4/a.4.1 3 FALSE TRUE complex FALSE 1 16 a/a.4/a.4.2 3 FALSE TRUE complex FALSE 3 17 a/a.4/a.4.3 3 FALSE TRUE complex FALSE 0 18 a/a.5 2 FALSE FALSE list FALSE 3 19 a/a.5/a.5.1 3 FALSE TRUE logical FALSE 1 20 a/a.5/a.5.2 3 FALSE TRUE logical FALSE 3 21 a/a.5/a.5.3 3 FALSE TRUE logical FALSE 0 22 b 1 TRUE FALSE list FALSE 2 23 b/b.1 2 FALSE FALSE list FALSE 2 24 b/b.1/b.1.1 3 FALSE FALSE list FALSE 3 25 b/b.1/b.1.1/b.1.1.1 4 FALSE TRUE matrix FALSE 1-1 26 b/b.1/b.1.1/b.1.1.2 4 FALSE TRUE matrix FALSE 2-2 27 b/b.1/b.1.1/b.1.1.3 4 FALSE TRUE matrix FALSE 1-1 28 b/b.1/b.1.2 3 FALSE FALSE list FALSE 3 29 b/b.1/b.1.2/b.1.2.1 4 FALSE TRUE data.frame FALSE 1-1 30 b/b.1/b.1.2/b.1.2.1/X1 5 FALSE TRUE numeric FALSE 1 31 b/b.1/b.1.2/b.1.2.2 4 FALSE TRUE data.frame FALSE 3-2 32 b/b.1/b.1.2/b.1.2.2/a 5 FALSE TRUE integer FALSE 3 33 b/b.1/b.1.2/b.1.2.2/b 5 FALSE TRUE integer FALSE 3 34 b/b.1/b.1.2/b.1.2.3 4 FALSE TRUE data.frame FALSE 0-0 35 b/b.2 2 FALSE FALSE list FALSE 1 36 b/b.2/b.2.1 3 FALSE FALSE list FALSE 4 37 b/b.2/b.2.1/b.2.1.1 4 FALSE FALSE list FALSE 1 38 b/b.2/b.2.1/b.2.1.1/a 5 FALSE TRUE numeric FALSE 1 39 b/b.2/b.2.1/b.2.1.2 4 FALSE FALSE list FALSE 2 40 b/b.2/b.2.1/b.2.1.2/a 5 FALSE TRUE integer FALSE 2 41 b/b.2/b.2.1/b.2.1.2/b 5 FALSE TRUE character FALSE 2 42 b/b.2/b.2.1/b.2.1.3 4 FALSE FALSE list FALSE 5 43 b/b.2/b.2.1/b.2.1.3/a 5 FALSE TRUE integer FALSE 2 44 b/b.2/b.2.1/b.2.1.3/b 5 FALSE TRUE character FALSE 1 45 b/b.2/b.2.1/b.2.1.3/c 5 FALSE TRUE integer FALSE 3 46 b/b.2/b.2.1/b.2.1.3/d 5 FALSE TRUE logical FALSE 1 47 b/b.2/b.2.1/b.2.1.3/e 5 FALSE TRUE integer FALSE 5 48 b/b.2/b.2.1/b.2.1.4 4 FALSE FALSE list FALSE 0 49 c 1 TRUE FALSE list FALSE 1 50 c/c.1 2 FALSE FALSE list FALSE 3 51 c/c.1/c.1.1 3 FALSE TRUE environment FALSE 52 c/c.1/c.1.2 3 FALSE TRUE TESTCLASS_X TRUE 1 53 c/c.1/c.1.3 3 FALSE TRUE TESTCLASS_Y TRUE 1 > objectIndex(src=src, handle.preserve=c("data.frame", "s4")) name pos is.top is.bottom class is.s4 dim 1 a 1 TRUE FALSE list FALSE 5 2 a/a.1 2 FALSE FALSE list FALSE 3 3 a/a.1/a.1.1 3 FALSE TRUE numeric FALSE 1 4 a/a.1/a.1.2 3 FALSE TRUE integer FALSE 5 5 a/a.1/a.1.3 3 FALSE TRUE integer FALSE 0 6 a/a.2 2 FALSE FALSE list FALSE 3 7 a/a.2/a.2.1 3 FALSE TRUE character FALSE 1 8 a/a.2/a.2.2 3 FALSE TRUE character FALSE 5 9 a/a.2/a.2.3 3 FALSE TRUE character FALSE 0 10 a/a.3 2 FALSE FALSE list FALSE 3 11 a/a.3/a.3.1 3 FALSE TRUE numeric FALSE 1 12 a/a.3/a.3.2 3 FALSE TRUE numeric FALSE 2 13 a/a.3/a.3.3 3 FALSE TRUE numeric FALSE 0 14 a/a.4 2 FALSE FALSE list FALSE 3 15 a/a.4/a.4.1 3 FALSE TRUE complex FALSE 1 16 a/a.4/a.4.2 3 FALSE TRUE complex FALSE 3 17 a/a.4/a.4.3 3 FALSE TRUE complex FALSE 0 18 a/a.5 2 FALSE FALSE list FALSE 3 19 a/a.5/a.5.1 3 FALSE TRUE logical FALSE 1 20 a/a.5/a.5.2 3 FALSE TRUE logical FALSE 3 21 a/a.5/a.5.3 3 FALSE TRUE logical FALSE 0 22 b 1 TRUE FALSE list FALSE 2 23 b/b.1 2 FALSE FALSE list FALSE 2 24 b/b.1/b.1.1 3 FALSE FALSE list FALSE 3 25 b/b.1/b.1.1/b.1.1.1 4 FALSE TRUE matrix FALSE 1-1 26 b/b.1/b.1.1/b.1.1.2 4 FALSE TRUE matrix FALSE 2-2 27 b/b.1/b.1.1/b.1.1.3 4 FALSE TRUE matrix FALSE 1-1 28 b/b.1/b.1.2 3 FALSE FALSE list FALSE 3 29 b/b.1/b.1.2/b.1.2.1 4 FALSE TRUE data.frame FALSE 1-1 30 b/b.1/b.1.2/b.1.2.2 4 FALSE TRUE data.frame FALSE 3-2 31 b/b.1/b.1.2/b.1.2.3 4 FALSE TRUE data.frame FALSE 0-0 32 b/b.2 2 FALSE FALSE list FALSE 1 33 b/b.2/b.2.1 3 FALSE FALSE list FALSE 4 34 b/b.2/b.2.1/b.2.1.1 4 FALSE FALSE list FALSE 1 35 b/b.2/b.2.1/b.2.1.1/a 5 FALSE TRUE numeric FALSE 1 36 b/b.2/b.2.1/b.2.1.2 4 FALSE FALSE list FALSE 2 37 b/b.2/b.2.1/b.2.1.2/a 5 FALSE TRUE integer FALSE 2 38 b/b.2/b.2.1/b.2.1.2/b 5 FALSE TRUE character FALSE 2 39 b/b.2/b.2.1/b.2.1.3 4 FALSE FALSE list FALSE 5 40 b/b.2/b.2.1/b.2.1.3/a 5 FALSE TRUE integer FALSE 2 41 b/b.2/b.2.1/b.2.1.3/b 5 FALSE TRUE character FALSE 1 42 b/b.2/b.2.1/b.2.1.3/c 5 FALSE TRUE integer FALSE 3 43 b/b.2/b.2.1/b.2.1.3/d 5 FALSE TRUE logical FALSE 1 44 b/b.2/b.2.1/b.2.1.3/e 5 FALSE TRUE integer FALSE 5 45 b/b.2/b.2.1/b.2.1.4 4 FALSE FALSE list FALSE 0 46 c 1 TRUE FALSE list FALSE 1 47 c/c.1 2 FALSE FALSE list FALSE 3 48 c/c.1/c.1.1 3 FALSE TRUE environment FALSE 49 c/c.1/c.1.2 3 FALSE TRUE TESTCLASS_X TRUE 1 50 c/c.1/c.1.3 3 FALSE TRUE TESTCLASS_Y TRUE 1
Benchmarking
> require(microbenchmark) > res <- microbenchmark(objectIndex(src=src)) median(res$time/1000000000) > [1] 0.3378009
Here’s the code:
Generics
setGeneric( name="argProcess", signature=c("arg"), def=function( arg, ... ){ standardGeneric("argProcess") } ) setGeneric( name="gregexpr2", signature=c("src"), def=function( src, ... ){ standardGeneric("gregexpr2") } ) setGeneric( name="objectIndex", signature=c("src"), def=function( src, ... ){ standardGeneric("objectIndex") } )
Methods
setMethod( f="argProcess", signature=signature(arg="character"), definition=function( arg, value.valid=NULL, idx.restrict=NULL, do.remove.null=FALSE, ... ){ #--------------------------------------------------------------------------- # ACTUAL PROCESSING #--------------------------------------------------------------------------- if(!is.null(value.valid)){ subsetValidate(src=arg, tgt=value.valid, .ARGS=.ARGS) } out <- arg # if(length(arg) > 1){ # out <- arg[1] # } if(!is.null(idx.restrict)){ if(!is.numeric(idx.restrict)){ stop(paste("/invalid restriction index", sep="")) } if(length(idx.restrict) > length(out)){ stop(paste("/invalid restriction index length", sep="")) } if(idx.restrict == 0){ if(length(out) > 1){ msg <- c( paste("/invalid argument value", sep=""), "\n", paste("* Value: '", paste(out, collapse=", "), "'", sep=""), "\n", paste("* Needs to be singular", sep="") ) stop(msg) } } else { out <- out[idx.restrict] } } # REMOVE NULL if(do.remove.null){ idx.null <- which(is.null(out)) if(length(idx.null)){ out <- out[-idx.null] } } # / # /ACTUAL PROCESSING ---------- #--------------------------------------------------------------------------- # FINALIZE #--------------------------------------------------------------------------- return(out) # /FINALIZE ---------- } ) setMethod( f="gregexpr2", signature=signature(src="character"), definition=function( src, pattern, do.ignore.case=FALSE, do.perl=TRUE, do.fixed=FALSE, do.useBytes=FALSE, do.simplify=FALSE, ... ){ #--------------------------------------------------------------------------- # ACTUAL PROCESSING #--------------------------------------------------------------------------- if(all(is.na(src))){ out <- lapply(1:length(src), function(x){ out <- data.frame(start=NA, stop=NA) }) } else { idx.regex <- gregexpr( text=src, pattern=pattern, ignore.case=do.ignore.case, perl=do.perl, fixed=do.fixed, useBytes=do.useBytes ) out <- lapply(idx.regex, function(x){ if(any(x == -1)){ out <- data.frame(start=NA, stop=NA) } else { out <- data.frame(start=x, stop=x+(attributes(x)$match.length-1)) } return(out) }) } if(do.simplify){ if(length(out) == 1){ out <- out[[1]] } } # /ACTUAL PROCESSING ---------- #--------------------------------------------------------------------------- # FINALIZE #--------------------------------------------------------------------------- return(out) # /FINALIZE ---------- } ) setMethod( f="objectIndex", signature=signature(src="list"), definition=function( src, handle.preserve=c("standard", "all", "no", "data.frame", "s4"), .delim.path="/", ... ){ #--------------------------------------------------------------------------- # GENERAL PREPROCESSING #--------------------------------------------------------------------------- handle.preserve <- argProcess( arg=handle.preserve, value.valid=c("standard", "all", "no", "data.frame", "s4") ) if("standard" %in% handle.preserve){ handle.preserve <- "standard" } # /GENERAL PREPROCESSING ---------- #--------------------------------------------------------------------------- # PREPROCESSING #--------------------------------------------------------------------------- # TRANSFORM IF FLAT if(all(length(grep(.delim.path, names(src))>0))){ # This is assuming that src is already an flattened list. cat(paste("/preprocessing 'src' (deepening) ...", sep=""), sep="\n") src <- objectDeepen(src=src, .ARGS=.ARGS) } # / pattern <- "(\\.\\.)" # RETRIEVE STRUCTURE temp <- capture.output(str(src)) temp <- temp[-1] # / # DROP ATTRIBUTES idx <- grep("\\.- attr\\(", temp) idx.drop <- unlist(lapply(idx, function(x.idx){ temp.attr <- temp[x.idx] idx.1 <- gregexpr2( src=temp.attr, pattern="(\\[.*\\])|([[:digit:]]+\\s*obs|of|with).*[[:digit:]]+", do.simplify=TRUE ) out <- NULL if(all(!is.na(unlist(idx.1)))){ out <- as.numeric( gsub("[[:alpha:]]|[[:punct:]]|\\s", "", substr(temp.attr, start=idx.1$start, stop=idx.1$stop)) ) out <- x.idx:(x.idx+out) } })) if(length(idx.drop)){ temp <- temp[-idx.drop] } #temp # / # DROP METHODS idx.drop <- grep("\\..*[[:digit:]]+\\s*methods", temp) if(length(idx.drop)){ temp <- temp[-idx.drop] } #temp # / # /PREPROCESSING ---------- #--------------------------------------------------------------------------- # CLASS + DIMENSION #--------------------------------------------------------------------------- # GET INDEX idx <- gregexpr2( src=temp, pattern=":.*$" ) # / # CLASS AND DIM INFO class.info <- lapply(seq(along=idx), function(x.idx){ out.0 <- substr(temp[x.idx], start=idx[[x.idx]]$start, stop=idx[[x.idx]]$stop) if(is.na(out.0)){ return(NULL) } # DIMENSION / LENGTH out.dim <- 1 # Check for zero length idx.1 <- gregexpr2( src=out.0, pattern="^:\\s*\\w+\\(.*\\)", do.simplify=TRUE ) if(!all(is.na(unlist(idx.1)))){ out.dim <- 0 idx.1 <- NA } else { # Check for non-zero length idx.1 <- gregexpr2( src=out.0, # pattern="\\[[.*\\]]", pattern="(\\[\\d+,?.*\\])|(:\t[[:digit:]]|(of|with).*[[:digit:]]+)", do.simplify=TRUE ) } #print(idx.1) # out.dim <- NA if(all(!is.na(unlist(idx.1)))){ #x.idx=1 out.dim <- lapply(1:nrow(idx.1), function(x.idx){ out.dim <- substr(out.0, start=idx.1$start[x.idx], stop=idx.1$stop[x.idx]) out.dim <- unlist(strsplit(out.dim, split=", ")) out.dim <- lapply(out.dim, function(x.dim){ out.dim <- x.dim idx.1 <- gregexpr2( src=out.dim, pattern="[[:digit:]]+", do.simplify=TRUE ) out.dim <- sapply(1:nrow(idx.1), function(x.row){ substr(out.dim, start=idx.1$start[x.row], stop=idx.1$stop[x.row]) }) }) if(length(out.dim) == 1){ out.dim <- out.dim[[1]] } out.dim }) } #print(out.dim) # / # CLASS idx.s4 <- FALSE idx.1 <- gregexpr2( src=out.0, pattern="^:\\s*<?\\w+:?|('.*')", do.simplify=TRUE ) out.class <- NA if(all(!is.na(idx.1$start))){ out.class <- sapply(1:nrow(idx.1), function(x.row){ out <- substr(out.0, start=idx.1$start[x.row], stop=idx.1$stop[x.row]) out <- gsub("^[[:punct:]]*|[[:punct:]]*$|\\s*", "", out) }) } if(class(out.dim[[1]]) == "list" & any(out.class != "data.frame")){ out.class <- "matrix" } else { out.dim <- unlist(out.dim) } out.class <- switch( out.class[1], "chr"="character", "cplx"="complex", "data.frame"={ out.dim <- paste(out.dim, collapse="-") "data.frame" }, "environment"={ out.dim <- NA "environment" }, "Formal"={ idx.s4 <- TRUE out.class[2] }, "int"="integer", "list"="list", "List"="list", "logi"="logical", "matrix"={ out.dim <- sapply(out.dim[[1]], function(x){ out.dim <- x if(length(out.dim) > 1){ out.dim <- out.dim[2] } out.dim }) out.dim <- paste(out.dim, collapse="-") "matrix" }, "num"="numeric", "Reference"={ idx.s4 <- TRUE out.class[2] } ) # / if(length(out.dim) > 1){ out.dim <- out.dim[2] } out <- data.frame(class=out.class, is.s4=idx.s4, dim=out.dim, stringsAsFactors=FALSE) return(out) }) class.info <- do.call("rbind", class.info) # / idx.na <- which(is.na(class.info$class)) if(length(idx.na)){ class.info <- class.info[-idx.na,] } # /CLASS + DIMENSION ---------- #--------------------------------------------------------------------------- # MAIN PROCESSING #--------------------------------------------------------------------------- temp <- gsub(":.*$", "", temp) temp <- gsub("\\s*\\$\\s*", "", temp) temp <- gsub("^\\s*", "", temp) temp <- gsub("@", "", temp) struc.names <- gsub("\\s*$", "", temp) #struc.names # / # TRANSFORM STRUCTURE idx.regex <- gregexpr2( src=struc.names, pattern=pattern ) struc.names <- gsub(paste(pattern, "|\\s*", sep=""), "", struc.names) struc.pos <- sapply(idx.regex, function(x){ out <- 0 if(all(!is.na(x$start))){ out <- nrow(x) } return(out) }) # STOP ON EMPTY NAMES idx.empty <- which(struc.names == "") if(length(idx.empty)){ msg <- c( paste("/names error:", sep=""), "\n", paste("* Names: {", paste(struc.names, collapse=", "), "}", sep="") ) stop(msg) } # / # STRUCTURE pos <- -1 path.0 <- NULL path.1 <- NULL is.top <- NULL idx.zero <- which(struc.pos == 0) for(x in 1:length(struc.pos)){ res <- struc.pos[x] pos.new <- list(res) path.0.new <- struc.names[x] if(struc.pos[x] > pos){ pos <- struc.pos[x] } else { # RESET if(struc.pos[x] < pos){ pos <- struc.pos[x] } if(pos == 0){ path.0 <- NULL } else { path.0 <- path.0[1:pos] } # / pos.new <- list(0:res) } path.0 <- c(path.0, path.0.new) path.1.new <- paste(path.0, collapse=.delim.path) path.1 <- c(path.1, path.1.new) is.top.new <- length(path.0) == 1 is.top <- c(is.top, is.top.new) } # STRUCTURE DF struc.df <- data.frame( name=path.1, pos=struc.pos+1, is.top=is.top, is.bottom=FALSE, class.info, stringsAsFactors=FALSE ) # / # BOTTOM BRANCHES classes.pres <- "list" idx.bottom <- which(!(struc.df$class %in% classes.pres) & !struc.df$is.s4) if(length(idx.bottom)){ struc.df$is.bottom[idx.bottom] <- TRUE } # / # HANDLE PRESERVE if(!("no" %in% handle.preserve)){ if(any(c("standard", "data.frame") %in% handle.preserve)){ idx.pres <- which(struc.df$class == "data.frame") if(length(idx.pres)){ struc.df$is.bottom[idx.pres] <- TRUE obj.dim <- sapply(strsplit(struc.df$dim[idx.pres], split="-"), function(x){ as.numeric(x[2]) }) idx.drop <- which(obj.dim == 0) if(length(idx.drop)){ obj.dim <- obj.dim[-idx.drop] idx.pres <- idx.pres[-idx.drop] } idx.temp <- data.frame(start=idx.pres+1, stop=idx.pres+obj.dim) idx.drop <- unlist(lapply(1:nrow(idx.temp), function(x.row){ idx.temp$start[x.row]:idx.temp$stop[x.row] })) if(length(idx.drop)){ struc.df <- struc.df[-idx.drop,] rownames(struc.df) <- NULL } } } if(any(c("standard", "s4") %in% handle.preserve)){ idx.pres <- which(struc.df$is.s4) if(length(idx.pres)){ struc.df$is.bottom[idx.pres] <- TRUE obj.dim <- as.numeric(struc.df$dim[idx.pres]) idx.temp <- data.frame(start=idx.pres+1, stop=idx.pres+obj.dim) idx.drop <- unlist(lapply(1:nrow(idx.temp), function(x.row){ idx.temp$start[x.row]:idx.temp$stop[x.row] })) idx.keep <- which(idx.drop %in% idx.pres) if(length(idx.keep)){ idx.drop <- idx.drop[-idx.keep] } if(length(idx.drop)){ struc.df <- struc.df[-idx.drop,] rownames(struc.df) <- NULL } } } if(any("all" %in% handle.preserve)){ idx.pres <- which(struc.df$class == "list") if(length(idx.pres)){ classes.pres <- c("data.frame", "environment", "list", "matrix") idx.drop <- which( struc.df$is.bottom & !(struc.df$class %in% classes.pres) & !(struc.df$is.s4) ) idx.change <- idx.pres + 1 idx <- which(idx.change %in% idx.drop) if(length(idx)){ idx.change <- idx.change[idx] - 1 } struc.df$is.bottom[idx.change] <- TRUE if(length(idx.drop)){ struc.df <- struc.df[-idx.drop,] rownames(struc.df) <- NULL } } } } # / out <- struc.df # /MAIN PROCESSING ---------- #--------------------------------------------------------------------------- # FINALIZE #--------------------------------------------------------------------------- return(out) # /FINALIZE ---------- } ) subsetValidate <- function( src, tgt, do.index=FALSE, index.type=c("logical", "numeric", "logical.all", "logical.any"), do.strict=TRUE, do.warning=TRUE, do.allow.nomatch=FALSE, value.nomatch=NULL, ... ){ #--------------------------------------------------------------------------- # GENERAL PREPROCESSING #--------------------------------------------------------------------------- if(missing(src) | missing(tgt)){ stop("/missing arguments 'src' and/or 'tgt'", sep="") } # VALIDATING AND PROCESSING SOURCE AND TARGET do.special.src <- FALSE if(!is.character(src)){ if(.do.verbose | .do.debug){ cat(paste("/non-character 'src'. Processing ...", sep=""), sep="\n") } if(!is.na(src)){ if(is.null(names(src))){ stop(paste("/'src' has no names. Processing failed.", sep=""), sep="\n") } } do.special.src <- TRUE src.0 <- src src <- names(src) } do.special.tgt <- FALSE if(!is.character(tgt)){ if(.do.verbose | .do.debug){ cat(paste("/non-character 'tgt'. Processing ...", sep=""), sep="\n") } if(!is.na(tgt)){ if(is.null(names(tgt))){ stop(paste("/'tgt' has no names. Processing failed.", sep=""), sep="\n") } } do.special.tgt <- TRUE tgt.0 <- tgt tgt <- names(tgt) } # / # if(!is.character(src) | !is.character(tgt)){ # stop(paste( # "/expecting args 'src' and 'tgt' to be of class 'character'", # sep="")) # } index.type.valid <- c("logical", "numeric", "logical.all", "logical.any") if(!all(index.type %in% index.type.valid)){ msg <- c( paste("/arg 'index.type' is invalid:", sep=""), "\n", paste("* Value: ", paste(deparse(index.type), collapse=""), sep=""), "\n", paste("* Valid: ", paste(deparse(index.type.valid), collapse=""), sep="") ) stop(msg) } if(do.index & length(index.type) > 1){ index.type <- index.type[1] } # /GENERAL PREPROCESSING ---------- #--------------------------------------------------------------------------- # ACTUAL PROCESSING #--------------------------------------------------------------------------- idx.0 <- src %in% tgt idx.1 <- which(idx.0) if(do.index){ if(.do.verbose | .do.debug){ cat(paste("/swiching arg 'do.strict = FALSE", sep=""), sep="\n") cat(paste("/swiching arg 'do.warning = FALSE", sep=""), sep="\n") } do.strict <- FALSE do.warning <- FALSE if(index.type == "logical"){ out.idx <- idx.0 names(out.idx) <- src } if(index.type == "numeric"){ out.idx <- idx.1 names(out.idx) <- src[out.idx] } if(index.type == "logical.all"){ out.idx <- all(idx.0) } if(index.type == "logical.any"){ out.idx <- any(idx.0) } } # NO MATCH if(!length(idx.1)){ if(!do.index){ if(do.allow.nomatch){ # LOCAL TERMINATE return(value.nomatch) } else { msg <- c( paste("/non-matching elements in 'src' (all):", sep=""), "\n", paste("* '", src, "'", sep="") ) stop(msg) } } else { if(index.type == "numeric"){ out.idx <- numeric(0) } return(any(idx.0)) } } # / # PARTLY MATCH if(!all(idx.0)){ msg <- c( paste("/partly non-matching elements in 'src':", sep=""), "\n", paste("* '", src[which(!idx.0)], "'", sep="") ) if(do.strict){ if(do.allow.nomatch){ # LOCAL TERMINATE return(value.nomatch) } else { stop(msg) } } else { if(do.warning){ warning(msg) } } } # / if(!do.index){ if(do.special.src){ if(is.data.frame(src.0) | is.matrix(src.0)){ tgt <- src.0[,idx.1] } else { tgt <- src.0[idx.1] } } else { tgt <- src[idx.1] } out <- tgt } else { out <- out.idx } # / #--------------------------------------------------------------------------- # FINALIZE #--------------------------------------------------------------------------- return(out) # /FINALIZE ---------- }
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.