How Best to Convert a Names-Values Tibble to a Named List?
[This article was first published on R – Jocelyn Ireson-Paine's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Here, in the spirit of my “Experiments with by_row()” post, are some experiments in writing and timing
a function spread_to_list
that converts a two-column tibble such as:
x 1 y 2 z 3 t 4to a named list:
list( x=1, y=2, z=3, t=4 )I need this for processing the parameter sheets shown in that by_row post, and I’ll explain why later. In this post, I’m just interested in how best to define
spread_to_list
.
The best implementation looks like either
spread_to_list_3
or spread_to_list_4
below.
# try_spread_to_list.R # # Consider a tibble t with two columns, # where each cell in the second column # represents the value associated with # the string (assumed to be its name) # in the first column. # # I want to define a function spread_to_list() # which converts t into a named list whose names # are the names in the first column, and whose # values are the values in the second column. # # For example, if t is: # names no_of_days # DaysInMay 31 # DaysInJune 30 # then # spread_to_list(t) # would be the list # list( DaysInMay = 31, DaysInJune = 30 ) # # The code here tries various ways # of implementing spread_to_list, # and benchmarks them. Two are variants # of one another, using spread() and converting # its result. Another two take the values # column as a list, and call setNames() to # convert to a named list. library( tidyverse ) library( microbenchmark ) library( stringr ) t <- tribble( ~a , ~b , 'x', 1 , 'y', 2 , 'z', 3 , 't', 4 ) # # I'm going to try various ways # of implementing spread_to_list() # on t. # First, what happens if I call spread()? s <- spread( t, a, b ) # # s becomes a one-row tibble: # t x y z # 1 4 1 2 3 # How can I convert s to a named list? # An obvious way is to call map(). # Let's see what the function argument to # map() gets passed if I call map() on s. map( s, show ) # # Displays # [1] 4 # [1] 1 # [1] 2 # [1] 3 # So it gets passed a column of the # tibble as an atomic vector. map( s, function(x)x ) # # Returns a list of these elements: # list(t = 4, x = 1, y = 2, z = 3) # That's because map() is defined to return # lists. So the call above uses it merely # as a type converter. map( s, identity ) # # Does the same. identity() is a built-in # identity function. # But maybe I can avoid mapping. In my # experiments with by_row(), # http://www.j-paine.org/blog/2017/10/experiments-with-by_row.html , # I discovered that as.list() will convert # a tibble to a named list. as.list( s ) # # Also returns # list(t = 4, x = 1, y = 2, z = 3). # But can I avoid spread() altogether? # Browsing the discussion groups gave me # the idea of trying setNames(). # Let's try that, passing t's names as its # second argunent and t's values as its # first. setNames( t[[2]], t[[1]] ) # # Gives me an atomic named vector. # I need to convert it to a list. # One way is for the argument to setNames() # to be a list, because that's specified to # make it return a list. setNames( as.list( t[[2]] ), t[[1]] ) # # Returns the list # list(x = 1, y = 2, z = 3, t = 4) # But I could convert the result instead. as.list( setNames( t[[2]], t[[1]] ) ) # # Also returns # list(x = 1, y = 2, z = 3, t = 4) # Let's try these four implementations. # First, define the functions. spread_to_list_1 <- function( t ) { colname1 <- names( t )[[1]] colname2 <- names( t )[[2]] t %>% spread( !!as.name(colname1), !!as.name(colname2) ) %>% map( identity ) } spread_to_list_2 <- function( t ) { colname1 <- names( t )[[1]] colname2 <- names( t )[[2]] t %>% spread( !!as.name(colname1), !!as.name(colname2) ) %>% as.list } spread_to_list_3 <- function( t ) { setNames( as.list( t[[2]] ), t[[1]] ) } spread_to_list_4 <- function( t ) { as.list( setNames( t[[2]], t[[1]] ) ) } # Now try them. s1 <- spread_to_list_1( t ) s2 <- spread_to_list_2( t ) s3 <- spread_to_list_3( t ) s4 <- spread_to_list_4( t ) dput( s1 ) dput( s2 ) dput( s3 ) dput( s4 ) identical( s1, s2 ) identical( s1, s3 ) identical( s1, s4 ) # They all return named lists, but the order # of elements is different for the spread()-based # versions than from the as.list()-based ones. # (That was obvious earlier, actually.) # So I'll sort the lists, then test that # they're identical. I'll also microbenchmark # the functions. sort_list <- function(l) { sort( unlist( l ) ) } identical( sort_list(s1), sort_list(s2) )%>%show identical( sort_list(s1), sort_list(s3) )%>%show identical( sort_list(s1), sort_list(s4) )%>%show mbres <- microbenchmark( spread_to_list_1( t ) , spread_to_list_2( t ) , spread_to_list_3( t ) , spread_to_list_4( t ) ) print( mbres ) # Now let's microbenchmark the functions applied # to bigger tibbles. I'll generate random name-value # tibbles of sizes n, where n is defined by the # vector in the 'for' condition. for ( n in c(10,30,100,300) ) { cat( "Trying ", n, "row tibble\n" ) names <- replicate( n, str_c(sample(letters,5,replace=FALSE),collapse="") ) # # Generate n random alphabetic strings. # From Dirk Eddelbuettel's answer to # https://stackoverflow.com/questions/1439513/creating-a-sequential-list-of-letters-with-r . values <- runif( n, 1, 100 ) # # Generate n random values. t <- tibble( names=names , values=values ) # # Use these to make a random tibble with # two columns and n rows. identical( sort_list(s1), sort_list(s2) )%>%show identical( sort_list(s1), sort_list(s3) )%>%show identical( sort_list(s1), sort_list(s4) )%>%show mbres <- microbenchmark( spread_to_list_1( t ) , spread_to_list_2( t ) , spread_to_list_3( t ) , spread_to_list_4( t ) ) print( mbres ) } # Here are the microbenchmark results for the # 300-row tibble: # Unit: microseconds # expr min lq # spread_to_list_1(t) 17738.631 17928.946 # spread_to_list_2(t) 14582.521 14805.888 # spread_to_list_3(t) 36.223 40.901 # spread_to_list_4(t) 35.317 40.146 # expr mean median uq # spread_to_list_1(t) 18668.44595 18221.133 19532.8080 # spread_to_list_2(t) 15386.05835 15050.685 16355.4180 # spread_to_list_3(t) 46.67244 47.089 51.4655 # spread_to_list_4(t) 45.77894 45.882 51.6165 # max neval # 21477.003 100 # 17314.838 100 # 64.294 100 # 63.087 100 # So the two spread() versions are much slower. # Converting the spread() result with mapping is # slower than with as.list(), probably unsurprisingly. # The two setNames() versions are much faster. # It doesn't seem to matter whether we type-convert # to list by making setNames()'s first argument # a list, or by making its result one.
To leave a comment for the author, please follow the link and comment on their blog: R – Jocelyn Ireson-Paine's Blog.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.