Site icon R-bloggers

Managing memory in a list of lists data structure

[This article was first published on Nathan VanHoudnos » rstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

First, a confession: instead of using classes and defining methods for them, I build a lot of ad hoc data structures out of lists and then build up one-off methods that operate on those lists of lists. I think this is a perl-ism that has transferred into my R code. I might eventually learn how to do classes, but this hack has been working well enough.

One issue I ran into today is that it was getting tedious to find out which objects stored in the list of lists was taking up the most memory. I ended up writing this rather silly recursive function that may be of use to you if you also have been scarred by perl.

# A hacked together function for exploring these structures
get.size <- function( obj.to.size, units='Kb') {
  # Check if the object we were passed is a list
  # N.B. Since is(list()) returns c('list', 'vector') we need a
  #      multiple value comparison like all.equal
  # N.B. Since all.equal will either return TRUE or a vector of 
  #      differences wrapping it in is.logical is the same as 
  #      checking if it returned TRUE. 
  if ( is.logical( all.equal( is(obj.to.size) , is(list())))) {
    # Iterate over each element of the list
    lapply( obj.to.size ,
      function(xx){
        # Calculate the size of the current element of the list
        # N.B. object.size always returns bytes, but its print 
        #      allows different units. Using capture.output allows
        #      us to do the conversion with the print method
        the.size <- capture.output(print(object.size(xx), units=units))
        # This object may itself be a list...
        if( is.logical( all.equal( is(xx), is(list())))) {
           # if so, recurse if we aren't already at zero size 
           if( the.size != paste(0, units) ) {
             the.rest <- get.size( xx , units)
             return( list(the.size, the.rest) )
           }else {
             # Or just return the zero size
             return( the.size )             
           }
        } else {
           # the element isn't a list, just return its size
           return( the.size)
        }
      })
  } else {
    # If the object wasn't a list, return an error.
    stop("The object passed to this function was not a list.")
  }
}

The output looks something like this

$models
$models[[1]]
[1] "2487.7 Kb"

$models[[2]]
$models[[2]]$naive.model
[1] "871 Kb"

$models[[2]]$clustered.model
[1] "664.5 Kb"

$models[[2]]$gls.model
[1] "951.9 Kb"



$V
[1] "4628.2 Kb"

$fixed.formula
[1] "1.2 Kb"

$random.formula
[1] "2.6 Kb"

where the first element of the list is the sum of everything below it in the hierarchy. Therefore, the whole “models” is 2487.7 Kb and “models$naive.model” is only 871 Kb of that total.

To leave a comment for the author, please follow the link and comment on their blog: Nathan VanHoudnos » rstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.