Managing memory in a list of lists data structure
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
First, a confession: instead of using classes and defining methods for them, I build a lot of ad hoc data structures out of lists and then build up one-off methods that operate on those lists of lists. I think this is a perl-ism that has transferred into my R code. I might eventually learn how to do classes, but this hack has been working well enough.
One issue I ran into today is that it was getting tedious to find out which objects stored in the list of lists was taking up the most memory. I ended up writing this rather silly recursive function that may be of use to you if you also have been scarred by perl.
# A hacked together function for exploring these structures get.size <- function( obj.to.size, units='Kb') { # Check if the object we were passed is a list # N.B. Since is(list()) returns c('list', 'vector') we need a # multiple value comparison like all.equal # N.B. Since all.equal will either return TRUE or a vector of # differences wrapping it in is.logical is the same as # checking if it returned TRUE. if ( is.logical( all.equal( is(obj.to.size) , is(list())))) { # Iterate over each element of the list lapply( obj.to.size , function(xx){ # Calculate the size of the current element of the list # N.B. object.size always returns bytes, but its print # allows different units. Using capture.output allows # us to do the conversion with the print method the.size <- capture.output(print(object.size(xx), units=units)) # This object may itself be a list... if( is.logical( all.equal( is(xx), is(list())))) { # if so, recurse if we aren't already at zero size if( the.size != paste(0, units) ) { the.rest <- get.size( xx , units) return( list(the.size, the.rest) ) }else { # Or just return the zero size return( the.size ) } } else { # the element isn't a list, just return its size return( the.size) } }) } else { # If the object wasn't a list, return an error. stop("The object passed to this function was not a list.") } }
The output looks something like this
$models $models[[1]] [1] "2487.7 Kb" $models[[2]] $models[[2]]$naive.model [1] "871 Kb" $models[[2]]$clustered.model [1] "664.5 Kb" $models[[2]]$gls.model [1] "951.9 Kb" $V [1] "4628.2 Kb" $fixed.formula [1] "1.2 Kb" $random.formula [1] "2.6 Kb"
where the first element of the list is the sum of everything below it in the hierarchy. Therefore, the whole “models” is 2487.7 Kb and “models$naive.model” is only 871 Kb of that total.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.