head(n = 10) #> L1 L2 L3 L4 L5 value #> 1 World Africa Northern Africa Algeria 0.08 #> 2 World Africa Northern Africa Egypt 5.69 #> 3 World Africa Northern Africa Libya 1.64 #> 4 World Africa Northern Africa Morocco 11.02 #> 5 World Africa Northern Africa Sudan 61.64 #> 6 World Africa Northern Africa Tunisia 12.47 #> 7 World Africa Northern Africa Western Sahara NA #> 8 World Africa Sub-Saharan Africa Eastern Africa British Indian Ocean Territory NA #> 9 World Africa Sub-Saharan Africa Eastern Africa Burundi 89.22 #> 10 World Africa Sub-Saharan Africa Eastern Africa Comoros 41.92 ## drop logical NA's and melt to data.frame rrapply( renewable_energy_by_country, classes = "numeric", how = "melt" ) |> head(n = 10) #> L1 L2 L3 L4 L5 value #> 1 World Africa Northern Africa Algeria 0.08 #> 2 World Africa Northern Africa Egypt 5.69 #> 3 World Africa Northern Africa Libya 1.64 #> 4 World Africa Northern Africa Morocco 11.02 #> 5 World Africa Northern Africa Sudan 61.64 #> 6 World Africa Northern Africa Tunisia 12.47 #> 7 World Africa Sub-Saharan Africa Eastern Africa Burundi 89.22 #> 8 World Africa Sub-Saharan Africa Eastern Africa Comoros 41.92 #> 9 World Africa Sub-Saharan Africa Eastern Africa Djibouti 28.50 #> 10 World Africa Sub-Saharan Africa Eastern Africa Eritrea 80.14 ## apply condition and melt to data.frame rrapply( renewable_energy_by_country, condition = \(x, .xparents) "Western Europe" %in% .xparents, how = "melt" ) |> head(n = 10) #> L1 L2 L3 L4 value #> 1 World Europe Western Europe Austria 34.67 #> 2 World Europe Western Europe Belgium 9.14 #> 3 World Europe Western Europe France 14.74 #> 4 World Europe Western Europe Germany 14.17 #> 5 World Europe Western Europe Liechtenstein 62.93 #> 6 World Europe Western Europe Luxembourg 13.54 #> 7 World Europe Western Europe Monaco NA #> 8 World Europe Western Europe Netherlands 5.78 #> 9 World Europe Western Europe Switzerland 25.49 As shown in the above examples, in comparison to reshape2::melt(), rrapply() allows to filter or transform list elements before melting the nested list through the f, classes and condition arguments2. More importantly, rrapply() is optimized specifically for handling nested lists, whereas reshape2::melt() was aimed primarily at melting data.frames before being superseded by tidyr::gather() and more recently tidyr::pivot_longer(). For this reason, reshape2::melt() can be quite slow when applied to large nested lists: ## melt to long data.frame (reshape2) reshape2::melt(renewable_energy_by_country) |> head(10) #> value L4 L5 L3 L2 L1 #> 1 0.08 Algeria Northern Africa Africa World #> 2 5.69 Egypt Northern Africa Africa World #> 3 1.64 Libya Northern Africa Africa World #> 4 11.02 Morocco Northern Africa Africa World #> 5 61.64 Sudan Northern Africa Africa World #> 6 12.47 Tunisia Northern Africa Africa World #> 7 NA Western Sahara Northern Africa Africa World #> 8 NA Eastern Africa British Indian Ocean Territory Sub-Saharan Africa Africa World #> 9 89.22 Eastern Africa Burundi Sub-Saharan Africa Africa World #> 10 41.92 Eastern Africa Comoros Sub-Saharan Africa Africa World ## computation times bench::mark( rrapply(renewable_energy_by_country), reshape2::melt(renewable_energy_by_country), check = FALSE ) #> # A tibble: 2 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> #> 1 rrapply(renewable_energy_by_country) 14.3µs 17µs 50340. 0B 20.1 #> 2 reshape2::melt(renewable_energy_by_country) 46ms 52.2ms 18.3 73.1KB 27.5 For a medium-sized list as used in this example, the computation time of reshape2::melt() is not a bottleneck for practical usage. However, the computational effort quickly increases when melting larger or more deeply nested lists: ## helper function to generate large nested list new_list .. ..$ : num 0.266 #> .. ..$ : num 0.372 #> .. .. [list output truncated] #> ..$ :List of 100 #> .. ..$ : num 0.655 #> .. ..$ : num 0.353 #> .. .. [list output truncated] #> .. [list output truncated] #> $ :List of 100 #> ..$ :List of 100 #> .. ..$ : num 0.0647 #> .. ..$ : num 0.677 #> .. .. [list output truncated] #> ..$ :List of 100 #> .. ..$ : num 0.266 #> .. ..$ : num 0.66 #> .. .. [list output truncated] #> .. [list output truncated] #> [list output truncated] ## benchmark timing with rrapply system.time(shallow_melt user system elapsed #> 1.183 0.040 1.223 head(shallow_melt) #> L1 L2 L3 value #> 1 1 1 1 0.2655087 #> 2 1 1 2 0.3721239 #> 3 1 1 3 0.5728534 #> 4 1 1 4 0.9082078 #> 5 1 1 5 0.2016819 #> 6 1 1 6 0.8983897 ## benchmark timing with reshape2::melt system.time(shallow_melt_reshape2 user system elapsed #> 163.969 0.036 164.019 head(shallow_melt_reshape2) #> value L3 L2 L1 #> 1 0.2655087 1 1 1 #> 2 0.3721239 2 1 1 #> 3 0.5728534 3 1 1 #> 4 0.9082078 4 1 1 #> 5 0.2016819 5 1 1 #> 6 0.8983897 6 1 1 ## generate large deeply nested list (2^18 elements) deep_list 0.761 0.008 0.769 head(deep_melt) #> L1 L2 L3 L4 L5 L6 L7 L8 L9 L10 L11 L12 L13 L14 L15 L16 L17 L18 value #> 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.14011775 #> 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 0.69562066 #> 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 0.72888445 #> 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 0.09164734 #> 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 0.06661200 #> 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 0.61285721 ## benchmark timing with reshape2::melt system.time(deep_melt_reshape2 user system elapsed #> 125.361 0.040 125.448 head(deep_melt_reshape2) #> value L18 L17 L16 L15 L14 L13 L12 L11 L10 L9 L8 L7 L6 L5 L4 L3 L2 L1 #> 1 0.14011775 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 #> 2 0.69562066 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 #> 3 0.72888445 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 #> 4 0.09164734 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 #> 5 0.06661200 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 #> 6 0.61285721 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Although unlikely to encounter such large or deeply nested lists in practice, these artificial examples serve to illustrate that reshape2::melt() is not particularly efficient in unnesting large nested lists to data.frames. Bind to wide data.frame The option how = "bind" unnests a nested list to a wide data.frame and is used to unnest nested lists containing repeated entries of the same variables. To illustrate, we consider the pokedex dataset included in the rrapply-package, a nested list containing various property values for each of the 151 original Pokémon available (in .json) from https://github.com/Biuni/PokemonGO-Pokedex. ## all 151 Pokemon str(pokedex, list.len = 3) #> List of 1 #> $ pokemon:List of 151 #> ..$ :List of 16 #> .. ..$ id : int 1 #> .. ..$ num : chr "001" #> .. ..$ name : chr "Bulbasaur" #> .. .. [list output truncated] #> ..$ :List of 17 #> .. ..$ id : int 2 #> .. ..$ num : chr "002" #> .. ..$ name : chr "Ivysaur" #> .. .. [list output truncated] #> ..$ :List of 15 #> .. ..$ id : int 3 #> .. ..$ num : chr "003" #> .. ..$ name : chr "Venusaur" #> .. .. [list output truncated] #> .. [list output truncated] ## single Pokemon entry str(pokedex[["pokemon"]][[1]]) #> List of 16 #> $ id : int 1 #> $ num : chr "001" #> $ name : chr "Bulbasaur" #> $ img : chr "http://www.serebii.net/pokemongo/pokemon/001.png" #> $ type : chr [1:2] "Grass" "Poison" #> $ height : chr "0.71 m" #> $ weight : chr "6.9 kg" #> $ candy : chr "Bulbasaur Candy" #> $ candy_count : int 25 #> $ egg : chr "2 km" #> $ spawn_chance : num 0.69 #> $ avg_spawns : int 69 #> $ spawn_time : chr "20:00" #> $ multipliers : num 1.58 #> $ weaknesses : chr [1:4] "Fire" "Ice" "Flying" "Psychic" #> $ next_evolution:List of 2 #> ..$ :List of 2 #> .. ..$ num : chr "002" #> .. ..$ name: chr "Ivysaur" #> ..$ :List of 2 #> .. ..$ num : chr "003" #> .. ..$ name: chr "Venusaur" Calling rrapply() with how = "bind expands each Pokémon sublist as a single row in a wide data.frame. The 151 rows are stacked and aligned by matching variable names, with missing entries replaced by NA’s (similar to data.table::rbindlist(..., fill = TRUE)). Note that any nested variables, such as next_evolution and prev_evolution, are unnested as wide as possible into individual data.frame columns similar to repeated application of tidyr::unnest_wider() to a data.frame with nested list-columns. rrapply(pokedex, how = "bind")[, 1:9] |> head() #> id num name img type height weight candy candy_count #> 1 1 001 Bulbasaur http://www.serebii.net/pokemongo/pokemon/001.png Grass, Poison 0.71 m 6.9 kg Bulbasaur Candy 25 #> 2 2 002 Ivysaur http://www.serebii.net/pokemongo/pokemon/002.png Grass, Poison 0.99 m 13.0 kg Bulbasaur Candy 100 #> 3 3 003 Venusaur http://www.serebii.net/pokemongo/pokemon/003.png Grass, Poison 2.01 m 100.0 kg Bulbasaur Candy NA #> 4 4 004 Charmander http://www.serebii.net/pokemongo/pokemon/004.png Fire 0.61 m 8.5 kg Charmander Candy 25 #> 5 5 005 Charmeleon http://www.serebii.net/pokemongo/pokemon/005.png Fire 1.09 m 19.0 kg Charmander Candy 100 #> 6 6 006 Charizard http://www.serebii.net/pokemongo/pokemon/006.png Fire, Flying 1.70 m 90.5 kg Charmander Candy NA By default, the list layer containing the repeated observations is identified by the minimal depth detected across leaf elements. This option can be overridden by the coldepth parameter in the options argument, which can be useful to unnest nested sublists, such as next_evolution or prev_evolution. In addition, setting namecols = TRUE in the options argument includes the parent list names associated to each row in the wide data.frame as individual columns L1, L2, etc. ## bind prev/next evolution columns rrapply( pokedex, how = "bind", options = list(coldepth = 5, namecols = TRUE) ) |> head(n = 10) #> L1 L2 L3 L4 num name #> 1 pokemon 1 next_evolution 1 002 Ivysaur #> 2 pokemon 1 next_evolution 2 003 Venusaur #> 3 pokemon 2 prev_evolution 1 001 Bulbasaur #> 4 pokemon 2 next_evolution 1 003 Venusaur #> 5 pokemon 3 prev_evolution 1 001 Bulbasaur #> 6 pokemon 3 prev_evolution 2 002 Ivysaur #> 7 pokemon 4 next_evolution 1 005 Charmeleon #> 8 pokemon 4 next_evolution 2 006 Charizard #> 9 pokemon 5 prev_evolution 1 004 Charmander #> 10 pokemon 5 next_evolution 1 006 Charizard Common alternatives Several common alternatives used to unnest lists containing repeated entries include data.table::rbindlist(), dplyr::bind_rows(), and tidyr’s dedicated rectangling functions unnest_longer(), unnest_wider() and hoist(). The first two functions are primarily aimed at binding lists of data.frames or lists of lists, but are not meant for nested lists containing multiple levels of nesting, such as pokedex: library(dplyr) ## simple list of lists lapply(pokedex[["pokemon"]], `[`, 1:4) |> bind_rows() |> head() #> # A tibble: 6 × 4 #> id num name img #> #> 1 1 001 Bulbasaur http://www.serebii.net/pokemongo/pokemon/001.png #> 2 2 002 Ivysaur http://www.serebii.net/pokemongo/pokemon/002.png #> 3 3 003 Venusaur http://www.serebii.net/pokemongo/pokemon/003.png #> 4 4 004 Charmander http://www.serebii.net/pokemongo/pokemon/004.png #> 5 5 005 Charmeleon http://www.serebii.net/pokemongo/pokemon/005.png #> 6 6 006 Charizard http://www.serebii.net/pokemongo/pokemon/006.png ## complex nested list (error) bind_rows(pokedex[["pokemon"]]) #> Error in `vctrs::data_frame()`: #> ! Can't recycle `id` (size 2) to match `weaknesses` (size 4). ## simple list of lists lapply(pokedex[["pokemon"]], `[`, 1:4) |> data.table::rbindlist() |> head() #> id num name img #> 1: 1 001 Bulbasaur http://www.serebii.net/pokemongo/pokemon/001.png #> 2: 2 002 Ivysaur http://www.serebii.net/pokemongo/pokemon/002.png #> 3: 3 003 Venusaur http://www.serebii.net/pokemongo/pokemon/003.png #> 4: 4 004 Charmander http://www.serebii.net/pokemongo/pokemon/004.png #> 5: 5 005 Charmeleon http://www.serebii.net/pokemongo/pokemon/005.png #> 6: 6 006 Charizard http://www.serebii.net/pokemongo/pokemon/006.png ## complex nested list (error) data.table::rbindlist(pokedex[["pokemon"]]) #> Error in data.table::rbindlist(pokedex[["pokemon"]]): Column 5 of item 1 is length 2 inconsistent with column 15 which is length 4. Only length-1 columns are recycled. The rectangling functions in the tidyr-package offer a lot more flexibility. A similar data.frame as returned by rrapply(pokedex, how = "bind") can be obtained by repeated application of tidyr::unnest_wider(): library(tidyr) library(tibble) as_tibble(pokedex) |> unnest_wider(pokemon) |> unnest_wider(next_evolution, names_sep = ".") |> unnest_wider(prev_evolution, names_sep = ".") |> unnest_wider(next_evolution.1, names_sep = ".") |> unnest_wider(next_evolution.2, names_sep = ".") |> unnest_wider(next_evolution.3, names_sep = ".") |> unnest_wider(prev_evolution.1, names_sep = ".") |> unnest_wider(prev_evolution.2, names_sep = ".") |> head() #> # A tibble: 6 × 25 #> id num name img type height weight candy candy…¹ egg spawn…² avg_s…³ spawn…⁴ multi…⁵ weakn…⁶ next_…⁷ next_…⁸ next_…⁹ next_…˟ next_…˟ #> #> 1 1 001 Bulbasaur http… 0.71 m 6.9 kg Bulb… 25 2 km 0.69 69 20:00 002 Ivysaur 003 Venusa… #> 2 2 002 Ivysaur http… 0.99 m 13.0 … Bulb… 100 Not … 0.042 4.2 07:00 003 Venusa… #> 3 3 003 Venusaur http… 2.01 m 100.0… Bulb… NA Not … 0.017 1.7 11:30 #> 4 4 004 Charmander http… 0.61 m 8.5 kg Char… 25 2 km 0.253 25.3 08:45 005 Charme… 006 Chariz… #> 5 5 005 Charmeleon http… 1.09 m 19.0 … Char… 100 Not … 0.012 1.2 19:00 006 Chariz… #> 6 6 006 Charizard http… 1.70 m 90.5 … Char… NA Not … 0.0031 0.31 13:34 #> # … with 5 more variables: next_evolution.3.name , prev_evolution.1.num , prev_evolution.1.name , prev_evolution.2.num , #> # prev_evolution.2.name , and abbreviated variable names ¹​candy_count, ²​spawn_chance, ³​avg_spawns, ⁴​spawn_time, ⁵​multipliers, ⁶​weaknesses, #> # ⁷​next_evolution.1.num, ⁸​next_evolution.1.name, ⁹​next_evolution.2.num, ˟​next_evolution.2.name, ˟​next_evolution.3.num #> # ℹ Use `colnames()` to see all variable names The option how = "bind" in rrapply() is less flexible as it always expands the nested list to a data.frame that is as wide as possible. On the other hand, the flexibility and interpretability in tidyr’s rectangling functions come at the cost of increased computational effort, which can become a bottleneck when unnesting large nested lists: ## large replicated pokedex list pokedex_large user system elapsed #> 2.155 0.044 2.204 ## unnest first layers prev_evolution and next_evolution system.time({ as_tibble(pokedex_large) |> unnest_wider(pokemon) |> unnest_wider(next_evolution, names_sep = ".") |> unnest_wider(prev_evolution, names_sep = ".") }) #> user system elapsed #> 126.633 0.320 127.060 Remark: in the chained calls to unnest_wider() above, we only unnest the first layer of the next_evolution and prev_evolution list-columns, and not any of the resulting children list-columns, which would only further increase computation time. To extract and unnest sublists at deeper levels of nesting in the list, such as next_evolution, we manually set the coldepth parameter in the options argument, as also demonstrated above: system.time({ ev1 user system elapsed #> 1.837 0.000 1.837 head(ev1) #> L1 L2 L3 L4 num name #> 1 pokemon 1 next_evolution 1 002 Ivysaur #> 2 pokemon 1 next_evolution 2 003 Venusaur #> 3 pokemon 2 next_evolution 1 003 Venusaur #> 4 pokemon 4 next_evolution 1 005 Charmeleon #> 5 pokemon 4 next_evolution 2 006 Charizard #> 6 pokemon 5 next_evolution 1 006 Charizard The same unnested version of the next_evolution sublists can be obtained by mixing several calls to unnest_wider() and unnest_longer(): system.time({ ev2 unnest_wider(pokemon) |> unnest_longer(next_evolution) |> unnest_wider(next_evolution, names_sep = "_") |> select(id, next_evolution_num, next_evolution_name) }) #> user system elapsed #> 96.874 0.040 96.941 head(ev2) #> # A tibble: 6 × 3 #> id next_evolution_num next_evolution_name #> #> 1 1 002 Ivysaur #> 2 1 003 Venusaur #> 3 2 003 Venusaur #> 4 3 #> 5 4 005 Charmeleon #> 6 4 006 Charizard In the context of the current example, a more efficient approach is to combine unnest_wider() with hoist(). The disadvantage is that we need to manually specify the exact locations of the elements that we wish to hoist from the nested list: system.time({ ev3 unnest_wider(pokemon) |> hoist(next_evolution, name.1 = list(1, "name"), name.2 = list(2, "name"), name.3 = list(3, "name") ) |> select(id, name.1, name.2, name.3) }) #> user system elapsed #> 42.608 0.124 42.734 head(ev3) #> # A tibble: 6 × 4 #> id name.1 name.2 name.3 #> #> 1 1 Ivysaur Venusaur #> 2 2 Venusaur #> 3 3 #> 4 4 Charmeleon Charizard #> 5 5 Charizard #> 6 6 Using rrapply(), the same result can be obtained by adding a call to reshape() (or alternatively e.g. tidyr::pivot_wider() or data.table::dcast()) by converting from a long to a wide data.frame: system.time({ ev4 2.212 0.000 2.212 head(ev5) #> L2 name.1 name.2 name.3 #> 1 1 Ivysaur Venusaur #> 3 2 Venusaur #> 4 4 Charmeleon Charizard #> 6 5 Charizard #> 7 7 Wartortle Blastoise #> 9 8 Blastoise Additional examples We conclude this section by replicating some of the data rectangling examples presented in the tidyr vignette: https://tidyr.tidyverse.org/articles/rectangle.html. The example nested lists are all conveniently included in the repurrrsive-package. GitHub Users library(repurrrsive) ## nested data str(gh_users, list.len = 3) #> List of 6 #> $ :List of 30 #> ..$ login : chr "gaborcsardi" #> ..$ id : int 660288 #> ..$ avatar_url : chr "https://avatars.githubusercontent.com/u/660288?v=3" #> .. [list output truncated] #> $ :List of 30 #> ..$ login : chr "jennybc" #> ..$ id : int 599454 #> ..$ avatar_url : chr "https://avatars.githubusercontent.com/u/599454?v=3" #> .. [list output truncated] #> $ :List of 30 #> ..$ login : chr "jtleek" #> ..$ id : int 1571674 #> ..$ avatar_url : chr "https://avatars.githubusercontent.com/u/1571674?v=3" #> .. [list output truncated] #> [list output truncated] ## unnested version rrapply(gh_users, how = "bind") |> as_tibble() #> # A tibble: 6 × 30 #> login id avata…¹ grava…² url html_…³ follo…⁴ follo…⁵ gists…⁶ starr…⁷ subsc…⁸ organ…⁹ repos…˟ event…˟ recei…˟ type site_…˟ name company blog #> #> 1 gabo… 6.60e5 https:… "" http… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… User FALSE Gábo… http… #> 2 jenn… 5.99e5 https:… "" http… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… User FALSE Jenn… http… #> 3 jtle… 1.57e6 https:… "" http… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… User FALSE Jeff… http… #> 4 juli… 1.25e7 https:… "" http… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… User FALSE Juli… juli… #> 5 leep… 3.51e6 https:… "" http… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… User FALSE Thom… http… #> 6 masa… 8.36e6 https:… "" http… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… User FALSE Maël… http… #> # … with 10 more variables: location , email , hireable , bio , public_repos , public_gists , followers , #> # following , created_at , updated_at , and abbreviated variable names ¹​avatar_url, ²​gravatar_id, ³​html_url, ⁴​followers_url, #> # ⁵​following_url, ⁶​gists_url, ⁷​starred_url, ⁸​subscriptions_url, ⁹​organizations_url, ˟​repos_url, ˟​events_url, ˟​received_events_url, ˟​site_admin #> # ℹ Use `colnames()` to see all variable names GitHub repos ## nested data str(gh_repos, list.len = 2) #> List of 6 #> $ :List of 30 #> ..$ :List of 68 #> .. ..$ id : int 61160198 #> .. ..$ name : chr "after" #> .. .. [list output truncated] #> ..$ :List of 68 #> .. ..$ id : int 40500181 #> .. ..$ name : chr "argufy" #> .. .. [list output truncated] #> .. [list output truncated] #> $ :List of 30 #> ..$ :List of 68 #> .. ..$ id : int 14756210 #> .. ..$ name : chr "2013-11_sfu" #> .. .. [list output truncated] #> ..$ :List of 68 #> .. ..$ id : int 14152301 #> .. ..$ name : chr "2014-01-27-miami" #> .. .. [list output truncated] #> .. [list output truncated] #> [list output truncated] ## unnested version rrapply(gh_repos, how = "bind") |> as_tibble() #> # A tibble: 176 × 84 #> id name full_…¹ owner…² owner…³ owner…⁴ owner…⁵ owner…⁶ owner…⁷ owner…⁸ owner…⁹ owner…˟ owner…˟ owner…˟ owner…˟ owner…˟ owner…˟ owner…˟ #> #> 1 61160198 after gaborc… gaborc… 660288 https:… "" https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… #> 2 40500181 argufy gaborc… gaborc… 660288 https:… "" https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… #> 3 36442442 ask gaborc… gaborc… 660288 https:… "" https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… #> 4 34924886 baseimpor… gaborc… gaborc… 660288 https:… "" https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… #> 5 61620661 citest gaborc… gaborc… 660288 https:… "" https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… #> 6 33907457 clisymbols gaborc… gaborc… 660288 https:… "" https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… #> 7 37236467 cmaker gaborc… gaborc… 660288 https:… "" https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… #> 8 67959624 cmark gaborc… gaborc… 660288 https:… "" https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… #> 9 63152619 conditions gaborc… gaborc… 660288 https:… "" https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… #> 10 24343686 crayon gaborc… gaborc… 660288 https:… "" https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… #> # … with 166 more rows, 66 more variables: owner.type , owner.site_admin , private , html_url , description , fork , #> # url , forks_url , keys_url , collaborators_url , teams_url , hooks_url , issue_events_url , events_url , #> # assignees_url , branches_url , tags_url , blobs_url , git_tags_url , git_refs_url , trees_url , #> # statuses_url , languages_url , stargazers_url , contributors_url , subscribers_url , subscription_url , #> # commits_url , git_commits_url , comments_url , issue_comment_url , contents_url , compare_url , merges_url , #> # archive_url , downloads_url , issues_url , pulls_url , milestones_url , notifications_url , labels_url , #> # releases_url , deployments_url , created_at , updated_at , pushed_at , git_url , ssh_url , clone_url , … #> # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names Game of Thrones characters ## nested data str(got_chars, list.len = 3) #> List of 30 #> $ :List of 18 #> ..$ url : chr "https://www.anapioficeandfire.com/api/characters/1022" #> ..$ id : int 1022 #> ..$ name : chr "Theon Greyjoy" #> .. [list output truncated] #> $ :List of 18 #> ..$ url : chr "https://www.anapioficeandfire.com/api/characters/1052" #> ..$ id : int 1052 #> ..$ name : chr "Tyrion Lannister" #> .. [list output truncated] #> $ :List of 18 #> ..$ url : chr "https://www.anapioficeandfire.com/api/characters/1074" #> ..$ id : int 1074 #> ..$ name : chr "Victarion Greyjoy" #> .. [list output truncated] #> [list output truncated] ## unnested version rrapply(got_chars, how = "bind") |> as_tibble() #> # A tibble: 30 × 18 #> url id name gender culture born died alive titles aliases father mother spouse alleg…¹ books povBo…² tvSer…³ playe…⁴ #> #> 1 https://www.anapioficeandfi… 1022 Theo… Male "Ironb… "In … "" TRUE "" "" "" #> 2 https://www.anapioficeandfi… 1052 Tyri… Male "" "In … "" TRUE "" "" "http… #> 3 https://www.anapioficeandfi… 1074 Vict… Male "Ironb… "In … "" TRUE "" "" "" #> 4 https://www.anapioficeandfi… 1109 Will Male "" "" "In … FALSE "" "" "" #> 5 https://www.anapioficeandfi… 1166 Areo… Male "Norvo… "In … "" TRUE "" "" "" #> 6 https://www.anapioficeandfi… 1267 Chett Male "" "At … "In … FALSE "" "" "" #> 7 https://www.anapioficeandfi… 1295 Cres… Male "" "In … "In … FALSE "" "" "" #> 8 https://www.anapioficeandfi… 130 Aria… Female "Dorni… "In … "" TRUE "" "" "" #> 9 https://www.anapioficeandfi… 1303 Daen… Female "Valyr… "In … "" TRUE "" "" "http… #> 10 https://www.anapioficeandfi… 1319 Davo… Male "Weste… "In … "" TRUE "" "" "http… #> # … with 20 more rows, and abbreviated variable names ¹​allegiances, ²​povBooks, ³​tvSeries, ⁴​playedBy #> # ℹ Use `print(n = ...)` to see more rows Sharla Gelfand’s discography ## nested data (first element) str(discog[1], list.len = 3) #> List of 1 #> $ :List of 5 #> ..$ instance_id : int 354823933 #> ..$ date_added : chr "2019-02-16T17:48:59-08:00" #> ..$ basic_information:List of 11 #> .. ..$ labels :List of 1 #> .. .. ..$ :List of 6 #> .. .. .. ..$ name : chr "Tobi Records (2)" #> .. .. .. ..$ entity_type : chr "1" #> .. .. .. ..$ catno : chr "TOB-013" #> .. .. .. .. [list output truncated] #> .. ..$ year : int 2015 #> .. ..$ master_url : NULL #> .. .. [list output truncated] #> .. [list output truncated] ## unnested version (excluding deeply nested sublists) discs # A tibble: 155 × 12 #> instance_id date_added basic_information.year basic_information.mast…¹ basic…² basic…³ basic…⁴ basic…⁵ basic…⁶ basic…⁷ id rating #> #> 1 354823933 2019-02-16T17:48:59-08:00 2015 7.50e6 "https… Demo https:… https:… 0 7.50e6 0 #> 2 354092601 2019-02-13T14:13:11-08:00 2013 https://api.discogs.com… 4.49e6 "https… Observ… https:… https:… 553057 4.49e6 0 #> 3 354091476 2019-02-13T14:07:23-08:00 2017 https://api.discogs.com… 9.83e6 "https… I https:… https:… 1109943 9.83e6 0 #> 4 351244906 2019-02-02T11:39:58-08:00 2017 https://api.discogs.com… 9.77e6 "https… Oído A… https:… https:… 1128934 9.77e6 0 #> 5 351244801 2019-02-02T11:39:37-08:00 2015 https://api.discogs.com… 7.24e6 "https… A Cat'… https:… https:… 857592 7.24e6 0 #> 6 351052065 2019-02-01T20:40:53-08:00 2019 https://api.discogs.com… 1.31e7 "https… Tashme https:… https:… 1498137 1.31e7 0 #> 7 350315345 2019-01-29T15:48:37-08:00 2014 https://api.discogs.com… 7.11e6 "https… Demo https:… https:… 852880 7.11e6 0 #> 8 350315103 2019-01-29T15:47:22-08:00 2015 https://api.discogs.com… 1.05e7 "https… Let Th… https:… https:… 869410 1.05e7 0 #> 9 350314507 2019-01-29T15:44:08-08:00 2017 https://api.discogs.com… 1.13e7 "" Sub Sp… https:… https:… 1281224 1.13e7 0 #> 10 350314047 2019-01-29T15:41:35-08:00 2017 1.17e7 "https… Demo https:… https:… 0 1.17e7 0 #> # … with 145 more rows, and abbreviated variable names ¹​basic_information.master_url, ²​basic_information.id, ³​basic_information.thumb, #> # ⁴​basic_information.title, ⁵​basic_information.cover_image, ⁶​basic_information.resource_url, ⁷​basic_information.master_id #> # ℹ Use `print(n = ...)` to see more rows ## unnest labels sublists labels # A tibble: 182 × 10 #> L1 L2 L3 L4 name entity_type catno resource_url id entit…¹ #> #> 1 1 basic_information labels 1 Tobi Records (2) 1 TOB-013 https://api.discogs.com/labels/6… 633407 Label #> 2 2 basic_information labels 1 La Vida Es Un Mus 1 Mus70 https://api.discogs.com/labels/3… 38322 Label #> 3 3 basic_information labels 1 La Vida Es Un Mus 1 MUS118 https://api.discogs.com/labels/3… 38322 Label #> 4 4 basic_information labels 1 La Vida Es Un Mus 1 MUS132 https://api.discogs.com/labels/3… 38322 Label #> 5 4 basic_information labels 2 Beat Generation 1 BEAT64 https://api.discogs.com/labels/8… 88198 Label #> 6 4 basic_information labels 3 Beat Generation 1 BEAT 64 https://api.discogs.com/labels/8… 88198 Label #> 7 5 basic_information labels 1 Katorga Works 1 KW-043 https://api.discogs.com/labels/2… 205895 Label #> 8 6 basic_information labels 1 High Fashion Industries 1 HFI017 https://api.discogs.com/labels/6… 637837 Label #> 9 7 basic_information labels 1 Mind Control Records (6) 1 none https://api.discogs.com/labels/7… 763103 Label #> 10 8 basic_information labels 1 Not On Label (Phantom Head Self-released) 1 none https://api.discogs.com/labels/8… 879916 Label #> # … with 172 more rows, and abbreviated variable name ¹​entity_type_name #> # ℹ Use `print(n = ...)` to see more rows ## merge disc id's with labels merge( x = data.frame(L1 = rownames(discs), disc_id = discs[, "id"]), y = labels, by = "L1", sort = FALSE ) |> as_tibble() #> # A tibble: 182 × 11 #> L1 disc_id L2 L3 L4 name entity_type catno resource_url id entit…¹ #> #> 1 1 7496378 basic_information labels 1 Tobi Records (2) 1 TOB-013 https://api.discogs.com… 633407 Label #> 2 2 4490852 basic_information labels 1 La Vida Es Un Mus 1 Mus70 https://api.discogs.com… 38322 Label #> 3 3 9827276 basic_information labels 1 La Vida Es Un Mus 1 MUS118 https://api.discogs.com… 38322 Label #> 4 4 9769203 basic_information labels 1 La Vida Es Un Mus 1 MUS132 https://api.discogs.com… 38322 Label #> 5 4 9769203 basic_information labels 2 Beat Generation 1 BEAT64 https://api.discogs.com… 88198 Label #> 6 4 9769203 basic_information labels 3 Beat Generation 1 BEAT 64 https://api.discogs.com… 88198 Label #> 7 5 7237138 basic_information labels 1 Katorga Works 1 KW-043 https://api.discogs.com… 205895 Label #> 8 6 13117042 basic_information labels 1 High Fashion Industries 1 HFI017 https://api.discogs.com… 637837 Label #> 9 7 7113575 basic_information labels 1 Mind Control Records (6) 1 none https://api.discogs.com… 763103 Label #> 10 8 10540713 basic_information labels 1 Not On Label (Phantom Head Self-released) 1 none https://api.discogs.com… 879916 Label #> # … with 172 more rows, and abbreviated variable name ¹​entity_type_name #> # ℹ Use `print(n = ...)` to see more rows Data.frame to nested list As a demonstrating example, we reconsider the long data.frame from the first section obtained after melting the renewable energy shares of all Western European countries: renewable_energy_melt_west_eu L1 L2 L3 L4 value #> 1 World Europe Western Europe Austria 34.67 #> 2 World Europe Western Europe Belgium 9.14 #> 3 World Europe Western Europe France 14.74 #> 4 World Europe Western Europe Germany 14.17 #> 5 World Europe Western Europe Liechtenstein 62.93 #> 6 World Europe Western Europe Luxembourg 13.54 #> 7 World Europe Western Europe Monaco NA #> 8 World Europe Western Europe Netherlands 5.78 #> 9 World Europe Western Europe Switzerland 25.49 For certain tasks, it may be necessary to convert this data.frame back to a nested list object, e.g. to write the data to a JSON- or XML-object or for some tree visualization purpose. Writing a recursive function to reconstruct the nested list can prove to be quite time-consuming and error-prone. In this context, the unlist() function has an inverse counterpart relist() that reconstructs a nested list from the unlisted vector. The relist() function always requires a skeleton nested list to repopulate, which can make it difficult to use in practice, as such a skeleton object is for instance unavailable for the current example. In particular, the melted data.frame contains only a subset of the original list elements, so we can not use the original list as a template object without filtering nodes from the original list as well. Unmelt to nested list To address this difficulty, rrapply() includes the dedicated option how = "unmelt" that performs the inverse operation of how = "melt". No skeleton object is needed in this case, only a plain data.frame in the format returned by how = "melt". To illustrate, we can convert the melted data.frame above to a nested list as follows: rrapply( renewable_energy_melt_west_eu, how = "unmelt" ) |> str(give.attr = FALSE) #> List of 1 #> $ World:List of 1 #> ..$ Europe:List of 1 #> .. ..$ Western Europe:List of 9 #> .. .. ..$ Austria : num 34.7 #> .. .. ..$ Belgium : num 9.14 #> .. .. ..$ France : num 14.7 #> .. .. ..$ Germany : num 14.2 #> .. .. ..$ Liechtenstein: num 62.9 #> .. .. ..$ Luxembourg : num 13.5 #> .. .. ..$ Monaco : num NA #> .. .. ..$ Netherlands : num 5.78 #> .. .. ..$ Switzerland : num 25.5 Remark 1: how = "unmelt" is based on a greedy approach parsing data.frame rows as list elements starting from the top of the data.frame. That is, rrapply() continues collecting children nodes as long as the parent node name remains unchanged. If, for instance, the goal is to create two separate nodes (on the same level) with the name "Western Europe", these nodes should not be listed directly after one another in the melted data.frame as rrapply() will group all children under a single "Western Europe" list element. Remark 2: Internally, how = "unmelt" reconstructs a nested list from the melted data.frame and subsequently follows the same conceptual framework as how = "replace". Any other function arguments, such as f and condition can be used in exactly the same way as when applying how = "replace" to a nested list object. Remark 3: how = "unmelt" does (currently) not restore the attributes of intermediate list nodes and is therefore not an exact inverse of how = "melt". The other way around always produces the same result: renewable_energy_unmelt 0.142 0.000 0.143 ## benchmark timing with relist deep_unlist 2.978 0.000 2.978 ## large shallow list (10^6 elements) ## benchmark timing with rrapply system.time(shallow_unmelt user system elapsed #> 0.084 0.000 0.085 ## benchmark timing with relist shallow_unlist 5.121 0.000 5.122 Note: the unmelted lists are not exactly identical to the original nested lists, since how = "unmelt" uses the placeholder names 1, 2, 3, etc. in the melted data.frames to name the nodes in the newly constructed lists, whereas the name attributes in the original lists are all empty. By removing all names from the unmelted lists, they become identical to their original counterparts: ## remove all list names deep_unmelt_unnamed [1] TRUE ## remove all list names shallow_unmelt_unnamed [1] TRUE References The latest stable version of the rrapply-package is available on CRAN. Additional details and examples on how to use the rrapply() function can be found at https://jorischau.github.io/rrapply/ and a quick reference sheet can be downloaded from the github repository at https://github.com/JorisChau/rrapply/. Session Info sessionInfo() #> R version 4.2.1 (2022-06-23) #> Platform: x86_64-pc-linux-gnu (64-bit) #> Running under: Ubuntu 20.04.4 LTS #> #> Matrix products: default #> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 #> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 #> #> locale: #> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 #> [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C #> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C #> #> attached base packages: #> [1] stats graphics grDevices utils datasets methods base #> #> other attached packages: #> [1] repurrrsive_1.0.0 tibble_3.1.7 tidyr_1.2.0 dplyr_1.0.9 rrapply_1.2.5 #> #> loaded via a namespace (and not attached): #> [1] Rcpp_1.0.9 bslib_0.3.1 compiler_4.2.1 pillar_1.8.0 jquerylib_0.1.4 plyr_1.8.7 tools_4.2.1 digest_0.6.29 #> [9] jsonlite_1.8.0 evaluate_0.15 lifecycle_1.0.1 pkgconfig_2.0.3 rlang_1.0.4 bench_1.1.2 cli_3.3.0 rstudioapi_0.13 #> [17] yaml_2.3.5 blogdown_1.10 xfun_0.31 fastmap_1.1.0 stringr_1.4.0 knitr_1.39 generics_0.1.3 sass_0.4.1 #> [25] vctrs_0.4.1 tidyselect_1.1.2 data.table_1.14.2 glue_1.6.2 R6_2.5.1 fansi_1.0.3 profmem_0.6.0 rmarkdown_2.14 #> [33] bookdown_0.27 purrr_0.3.4 reshape2_1.4.4 magrittr_2.0.3 htmltools_0.5.2 ellipsis_0.3.2 utf8_1.2.2 stringi_1.7.8 The renewable_energy_by_country dataset is publicly available at the United Nations Open SDG Data Hub↩︎ Note that rrapply() imposes a different column order than reshape2::melt() and the "value" column may follow slightly different coercion rules, but other than that the melted data.frames are the same.↩︎ " />

Efficient list melting and unnesting with {rrapply}

[This article was first published on R-bloggers | A Random Walk, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

sticker

Introduction

The previous post showcases the rrapply() function in the minimal rrapply-package as a revised and extended version of base rapply() in the context of nested list recursion in R. For quick data exploration of a nested list it can make sense to keep the list in its original nested format to reduce the number of processing steps and minimize code complexity. As part of a more elaborate data analysis, if there is no specific reason to keep the nested data structure, it is often more practical to transform the nested list into a more convenient rectangular format and work with the unnested object (e.g. a data.frame) instead. In this follow-up post, we review the available (how) options in rrapply() to unnest or melt nested lists into a rectangular format in more detail and highlight the similarities and differences with respect to several common alternatives in R.

Nested list to data.frame

Melt to long data.frame

The option how = "melt" in rrapply() unnests a nested list to a long or melted data.frame similar in format to the retired reshape2::melt() function applied to a nested list. The rows of the melted data.frame contain the individual node paths of the elements in the nested list after pruning (based on the condition and/or classes arguments). The "value" column is a vector- or list-column containing the values of the leaf elements identical to the object returned by how = "flatten".

To demonstrate, we use the renewable_energy_by_country dataset included in the rrapply-package, a nested list containing the renewable energy shares per country (% of total energy consumption) in 20161. The 249 countries and areas are structured based on their geographical locations according to the United Nations M49 standard. The numeric values listed for each country are percentages, if no data is available the value of the country is NA.

library(rrapply)

## melt all data to long data.frame
rrapply(
  renewable_energy_by_country, 
  how = "melt"
) |>
  head(n = 10)
#>       L1     L2                 L3             L4                             L5 value
#> 1  World Africa    Northern Africa        Algeria                           <NA>  0.08
#> 2  World Africa    Northern Africa          Egypt                           <NA>  5.69
#> 3  World Africa    Northern Africa          Libya                           <NA>  1.64
#> 4  World Africa    Northern Africa        Morocco                           <NA> 11.02
#> 5  World Africa    Northern Africa          Sudan                           <NA> 61.64
#> 6  World Africa    Northern Africa        Tunisia                           <NA> 12.47
#> 7  World Africa    Northern Africa Western Sahara                           <NA>    NA
#> 8  World Africa Sub-Saharan Africa Eastern Africa British Indian Ocean Territory    NA
#> 9  World Africa Sub-Saharan Africa Eastern Africa                        Burundi 89.22
#> 10 World Africa Sub-Saharan Africa Eastern Africa                        Comoros 41.92
## drop logical NA's and melt to data.frame
rrapply(
  renewable_energy_by_country,
  classes = "numeric",
  how = "melt"
) |>
  head(n = 10)
#>       L1     L2                 L3             L4       L5 value
#> 1  World Africa    Northern Africa        Algeria     <NA>  0.08
#> 2  World Africa    Northern Africa          Egypt     <NA>  5.69
#> 3  World Africa    Northern Africa          Libya     <NA>  1.64
#> 4  World Africa    Northern Africa        Morocco     <NA> 11.02
#> 5  World Africa    Northern Africa          Sudan     <NA> 61.64
#> 6  World Africa    Northern Africa        Tunisia     <NA> 12.47
#> 7  World Africa Sub-Saharan Africa Eastern Africa  Burundi 89.22
#> 8  World Africa Sub-Saharan Africa Eastern Africa  Comoros 41.92
#> 9  World Africa Sub-Saharan Africa Eastern Africa Djibouti 28.50
#> 10 World Africa Sub-Saharan Africa Eastern Africa  Eritrea 80.14
## apply condition and melt to data.frame
rrapply(
  renewable_energy_by_country,
  condition = \(x, .xparents) "Western Europe" %in% .xparents,
  how = "melt"
) |>
  head(n = 10)
#>      L1     L2             L3            L4 value
#> 1 World Europe Western Europe       Austria 34.67
#> 2 World Europe Western Europe       Belgium  9.14
#> 3 World Europe Western Europe        France 14.74
#> 4 World Europe Western Europe       Germany 14.17
#> 5 World Europe Western Europe Liechtenstein 62.93
#> 6 World Europe Western Europe    Luxembourg 13.54
#> 7 World Europe Western Europe        Monaco    NA
#> 8 World Europe Western Europe   Netherlands  5.78
#> 9 World Europe Western Europe   Switzerland 25.49

As shown in the above examples, in comparison to reshape2::melt(), rrapply() allows to filter or transform list elements before melting the nested list through the f, classes and condition arguments2. More importantly, rrapply() is optimized specifically for handling nested lists, whereas reshape2::melt() was aimed primarily at melting data.frames before being superseded by tidyr::gather() and more recently tidyr::pivot_longer(). For this reason, reshape2::melt() can be quite slow when applied to large nested lists:

## melt to long data.frame (reshape2)
reshape2::melt(renewable_energy_by_country) |>
  head(10)
#>    value             L4                             L5                 L3     L2    L1
#> 1   0.08        Algeria                           <NA>    Northern Africa Africa World
#> 2   5.69          Egypt                           <NA>    Northern Africa Africa World
#> 3   1.64          Libya                           <NA>    Northern Africa Africa World
#> 4  11.02        Morocco                           <NA>    Northern Africa Africa World
#> 5  61.64          Sudan                           <NA>    Northern Africa Africa World
#> 6  12.47        Tunisia                           <NA>    Northern Africa Africa World
#> 7     NA Western Sahara                           <NA>    Northern Africa Africa World
#> 8     NA Eastern Africa British Indian Ocean Territory Sub-Saharan Africa Africa World
#> 9  89.22 Eastern Africa                        Burundi Sub-Saharan Africa Africa World
#> 10 41.92 Eastern Africa                        Comoros Sub-Saharan Africa Africa World

## computation times
bench::mark(
  rrapply(renewable_energy_by_country),
  reshape2::melt(renewable_energy_by_country),
  check = FALSE
)
#> # A tibble: 2 × 6
#>   expression                                       min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                                  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 rrapply(renewable_energy_by_country)          14.3µs     17µs   50340.         0B     20.1
#> 2 reshape2::melt(renewable_energy_by_country)     46ms   52.2ms      18.3    73.1KB     27.5

For a medium-sized list as used in this example, the computation time of reshape2::melt() is not a bottleneck for practical usage. However, the computational effort quickly increases when melting larger or more deeply nested lists:

## helper function to generate large nested list
new_list <- function(n, d) {
  v <- vector(mode = "list", length = n)
  rrapply(
    object = v,
    classes = c("list", "NULL"),
    condition = \(x, .xpos) length(.xpos) <= d,
    f = \(x, .xpos) if(length(.xpos) < d) v else runif(1),
    how = "recurse"
  )
}

## random seed
set.seed(1)

## generate large shallow list (10^6 elements)
shallow_list <- new_list(n = 100, d = 3)
str(shallow_list, list.len = 2)
#> List of 100
#>  $ :List of 100
#>   ..$ :List of 100
#>   .. ..$ : num 0.266
#>   .. ..$ : num 0.372
#>   .. .. [list output truncated]
#>   ..$ :List of 100
#>   .. ..$ : num 0.655
#>   .. ..$ : num 0.353
#>   .. .. [list output truncated]
#>   .. [list output truncated]
#>  $ :List of 100
#>   ..$ :List of 100
#>   .. ..$ : num 0.0647
#>   .. ..$ : num 0.677
#>   .. .. [list output truncated]
#>   ..$ :List of 100
#>   .. ..$ : num 0.266
#>   .. ..$ : num 0.66
#>   .. .. [list output truncated]
#>   .. [list output truncated]
#>   [list output truncated]

## benchmark timing with rrapply
system.time(shallow_melt <- rrapply(shallow_list, how = "melt")) 
#>    user  system elapsed 
#>   1.183   0.040   1.223
head(shallow_melt)
#>   L1 L2 L3     value
#> 1  1  1  1 0.2655087
#> 2  1  1  2 0.3721239
#> 3  1  1  3 0.5728534
#> 4  1  1  4 0.9082078
#> 5  1  1  5 0.2016819
#> 6  1  1  6 0.8983897

## benchmark timing with reshape2::melt
system.time(shallow_melt_reshape2 <- reshape2::melt(shallow_list))
#>    user  system elapsed 
#> 163.969   0.036 164.019
head(shallow_melt_reshape2)
#>       value L3 L2 L1
#> 1 0.2655087  1  1  1
#> 2 0.3721239  2  1  1
#> 3 0.5728534  3  1  1
#> 4 0.9082078  4  1  1
#> 5 0.2016819  5  1  1
#> 6 0.8983897  6  1  1
## generate large deeply nested list (2^18 elements)
deep_list <- new_list(n = 2, d = 18)

## benchmark timing with rrapply
system.time(deep_melt <- rrapply(deep_list, how = "melt")) 
#>    user  system elapsed 
#>   0.761   0.008   0.769
head(deep_melt)
#>   L1 L2 L3 L4 L5 L6 L7 L8 L9 L10 L11 L12 L13 L14 L15 L16 L17 L18      value
#> 1  1  1  1  1  1  1  1  1  1   1   1   1   1   1   1   1   1   1 0.14011775
#> 2  1  1  1  1  1  1  1  1  1   1   1   1   1   1   1   1   1   2 0.69562066
#> 3  1  1  1  1  1  1  1  1  1   1   1   1   1   1   1   1   2   1 0.72888445
#> 4  1  1  1  1  1  1  1  1  1   1   1   1   1   1   1   1   2   2 0.09164734
#> 5  1  1  1  1  1  1  1  1  1   1   1   1   1   1   1   2   1   1 0.06661200
#> 6  1  1  1  1  1  1  1  1  1   1   1   1   1   1   1   2   1   2 0.61285721

## benchmark timing with reshape2::melt
system.time(deep_melt_reshape2 <- reshape2::melt(deep_list))
#>    user  system elapsed 
#> 125.361   0.040 125.448
head(deep_melt_reshape2)
#>        value L18 L17 L16 L15 L14 L13 L12 L11 L10 L9 L8 L7 L6 L5 L4 L3 L2 L1
#> 1 0.14011775   1   1   1   1   1   1   1   1   1  1  1  1  1  1  1  1  1  1
#> 2 0.69562066   2   1   1   1   1   1   1   1   1  1  1  1  1  1  1  1  1  1
#> 3 0.72888445   1   2   1   1   1   1   1   1   1  1  1  1  1  1  1  1  1  1
#> 4 0.09164734   2   2   1   1   1   1   1   1   1  1  1  1  1  1  1  1  1  1
#> 5 0.06661200   1   1   2   1   1   1   1   1   1  1  1  1  1  1  1  1  1  1
#> 6 0.61285721   2   1   2   1   1   1   1   1   1  1  1  1  1  1  1  1  1  1

Although unlikely to encounter such large or deeply nested lists in practice, these artificial examples serve to illustrate that reshape2::melt() is not particularly efficient in unnesting large nested lists to data.frames.

Bind to wide data.frame

The option how = "bind" unnests a nested list to a wide data.frame and is used to unnest nested lists containing repeated entries of the same variables. To illustrate, we consider the pokedex dataset included in the rrapply-package, a nested list containing various property values for each of the 151 original Pokémon available (in .json) from https://github.com/Biuni/PokemonGO-Pokedex.

## all 151 Pokemon
str(pokedex, list.len = 3)
#> List of 1
#>  $ pokemon:List of 151
#>   ..$ :List of 16
#>   .. ..$ id            : int 1
#>   .. ..$ num           : chr "001"
#>   .. ..$ name          : chr "Bulbasaur"
#>   .. .. [list output truncated]
#>   ..$ :List of 17
#>   .. ..$ id            : int 2
#>   .. ..$ num           : chr "002"
#>   .. ..$ name          : chr "Ivysaur"
#>   .. .. [list output truncated]
#>   ..$ :List of 15
#>   .. ..$ id            : int 3
#>   .. ..$ num           : chr "003"
#>   .. ..$ name          : chr "Venusaur"
#>   .. .. [list output truncated]
#>   .. [list output truncated]

## single Pokemon entry
str(pokedex[["pokemon"]][[1]])
#> List of 16
#>  $ id            : int 1
#>  $ num           : chr "001"
#>  $ name          : chr "Bulbasaur"
#>  $ img           : chr "http://www.serebii.net/pokemongo/pokemon/001.png"
#>  $ type          : chr [1:2] "Grass" "Poison"
#>  $ height        : chr "0.71 m"
#>  $ weight        : chr "6.9 kg"
#>  $ candy         : chr "Bulbasaur Candy"
#>  $ candy_count   : int 25
#>  $ egg           : chr "2 km"
#>  $ spawn_chance  : num 0.69
#>  $ avg_spawns    : int 69
#>  $ spawn_time    : chr "20:00"
#>  $ multipliers   : num 1.58
#>  $ weaknesses    : chr [1:4] "Fire" "Ice" "Flying" "Psychic"
#>  $ next_evolution:List of 2
#>   ..$ :List of 2
#>   .. ..$ num : chr "002"
#>   .. ..$ name: chr "Ivysaur"
#>   ..$ :List of 2
#>   .. ..$ num : chr "003"
#>   .. ..$ name: chr "Venusaur"

Calling rrapply() with how = "bind expands each Pokémon sublist as a single row in a wide data.frame. The 151 rows are stacked and aligned by matching variable names, with missing entries replaced by NA’s (similar to data.table::rbindlist(..., fill = TRUE)). Note that any nested variables, such as next_evolution and prev_evolution, are unnested as wide as possible into individual data.frame columns similar to repeated application of tidyr::unnest_wider() to a data.frame with nested list-columns.

rrapply(pokedex, how = "bind")[, 1:9] |>
  head()
#>   id num       name                                              img          type height   weight            candy candy_count
#> 1  1 001  Bulbasaur http://www.serebii.net/pokemongo/pokemon/001.png Grass, Poison 0.71 m   6.9 kg  Bulbasaur Candy          25
#> 2  2 002    Ivysaur http://www.serebii.net/pokemongo/pokemon/002.png Grass, Poison 0.99 m  13.0 kg  Bulbasaur Candy         100
#> 3  3 003   Venusaur http://www.serebii.net/pokemongo/pokemon/003.png Grass, Poison 2.01 m 100.0 kg  Bulbasaur Candy          NA
#> 4  4 004 Charmander http://www.serebii.net/pokemongo/pokemon/004.png          Fire 0.61 m   8.5 kg Charmander Candy          25
#> 5  5 005 Charmeleon http://www.serebii.net/pokemongo/pokemon/005.png          Fire 1.09 m  19.0 kg Charmander Candy         100
#> 6  6 006  Charizard http://www.serebii.net/pokemongo/pokemon/006.png  Fire, Flying 1.70 m  90.5 kg Charmander Candy          NA

By default, the list layer containing the repeated observations is identified by the minimal depth detected across leaf elements. This option can be overridden by the coldepth parameter in the options argument, which can be useful to unnest nested sublists, such as next_evolution or prev_evolution. In addition, setting namecols = TRUE in the options argument includes the parent list names associated to each row in the wide data.frame as individual columns L1, L2, etc.

## bind prev/next evolution columns
rrapply(
  pokedex, 
  how = "bind",
  options = list(coldepth = 5, namecols = TRUE)
) |>
  head(n = 10)
#>         L1 L2             L3 L4 num       name
#> 1  pokemon  1 next_evolution  1 002    Ivysaur
#> 2  pokemon  1 next_evolution  2 003   Venusaur
#> 3  pokemon  2 prev_evolution  1 001  Bulbasaur
#> 4  pokemon  2 next_evolution  1 003   Venusaur
#> 5  pokemon  3 prev_evolution  1 001  Bulbasaur
#> 6  pokemon  3 prev_evolution  2 002    Ivysaur
#> 7  pokemon  4 next_evolution  1 005 Charmeleon
#> 8  pokemon  4 next_evolution  2 006  Charizard
#> 9  pokemon  5 prev_evolution  1 004 Charmander
#> 10 pokemon  5 next_evolution  1 006  Charizard

Common alternatives

Several common alternatives used to unnest lists containing repeated entries include data.table::rbindlist(), dplyr::bind_rows(), and tidyr’s dedicated rectangling functions unnest_longer(), unnest_wider() and hoist().

The first two functions are primarily aimed at binding lists of data.frames or lists of lists, but are not meant for nested lists containing multiple levels of nesting, such as pokedex:

library(dplyr)

## simple list of lists
lapply(pokedex[["pokemon"]], `[`, 1:4) |>
  bind_rows() |> 
  head()
#> # A tibble: 6 × 4
#>      id num   name       img                                             
#>   <int> <chr> <chr>      <chr>                                           
#> 1     1 001   Bulbasaur  http://www.serebii.net/pokemongo/pokemon/001.png
#> 2     2 002   Ivysaur    http://www.serebii.net/pokemongo/pokemon/002.png
#> 3     3 003   Venusaur   http://www.serebii.net/pokemongo/pokemon/003.png
#> 4     4 004   Charmander http://www.serebii.net/pokemongo/pokemon/004.png
#> 5     5 005   Charmeleon http://www.serebii.net/pokemongo/pokemon/005.png
#> 6     6 006   Charizard  http://www.serebii.net/pokemongo/pokemon/006.png

## complex nested list (error)
bind_rows(pokedex[["pokemon"]])
#> Error in `vctrs::data_frame()`:
#> ! Can't recycle `id` (size 2) to match `weaknesses` (size 4).

## simple list of lists
lapply(pokedex[["pokemon"]], `[`, 1:4) |>
  data.table::rbindlist() |>
  head()
#>    id num       name                                              img
#> 1:  1 001  Bulbasaur http://www.serebii.net/pokemongo/pokemon/001.png
#> 2:  2 002    Ivysaur http://www.serebii.net/pokemongo/pokemon/002.png
#> 3:  3 003   Venusaur http://www.serebii.net/pokemongo/pokemon/003.png
#> 4:  4 004 Charmander http://www.serebii.net/pokemongo/pokemon/004.png
#> 5:  5 005 Charmeleon http://www.serebii.net/pokemongo/pokemon/005.png
#> 6:  6 006  Charizard http://www.serebii.net/pokemongo/pokemon/006.png

## complex nested list (error)
data.table::rbindlist(pokedex[["pokemon"]])
#> Error in data.table::rbindlist(pokedex[["pokemon"]]): Column 5 of item 1 is length 2 inconsistent with column 15 which is length 4. Only length-1 columns are recycled.

The rectangling functions in the tidyr-package offer a lot more flexibility. A similar data.frame as returned by rrapply(pokedex, how = "bind") can be obtained by repeated application of tidyr::unnest_wider():

library(tidyr)
library(tibble)

as_tibble(pokedex) |>
  unnest_wider(pokemon) |>
  unnest_wider(next_evolution, names_sep = ".") |>
  unnest_wider(prev_evolution, names_sep = ".") |>
  unnest_wider(next_evolution.1, names_sep = ".") |>
  unnest_wider(next_evolution.2, names_sep = ".") |>
  unnest_wider(next_evolution.3, names_sep = ".") |>
  unnest_wider(prev_evolution.1, names_sep = ".") |>
  unnest_wider(prev_evolution.2, names_sep = ".") |>
  head()
#> # A tibble: 6 × 25
#>      id num   name       img   type  height weight candy candy…¹ egg   spawn…² avg_s…³ spawn…⁴ multi…⁵ weakn…⁶ next_…⁷ next_…⁸ next_…⁹ next_…˟ next_…˟
#>   <int> <chr> <chr>      <chr> <lis> <chr>  <chr>  <chr>   <int> <chr>   <dbl>   <dbl> <chr>   <list>  <list>  <chr>   <chr>   <chr>   <chr>   <chr>  
#> 1     1 001   Bulbasaur  http… <chr> 0.71 m 6.9 kg Bulb…      25 2 km   0.69     69    20:00   <dbl>   <chr>   002     Ivysaur 003     Venusa… <NA>   
#> 2     2 002   Ivysaur    http… <chr> 0.99 m 13.0 … Bulb…     100 Not …  0.042     4.2  07:00   <dbl>   <chr>   003     Venusa… <NA>    <NA>    <NA>   
#> 3     3 003   Venusaur   http… <chr> 2.01 m 100.0… Bulb…      NA Not …  0.017     1.7  11:30   <dbl>   <chr>   <NA>    <NA>    <NA>    <NA>    <NA>   
#> 4     4 004   Charmander http… <chr> 0.61 m 8.5 kg Char…      25 2 km   0.253    25.3  08:45   <dbl>   <chr>   005     Charme… 006     Chariz… <NA>   
#> 5     5 005   Charmeleon http… <chr> 1.09 m 19.0 … Char…     100 Not …  0.012     1.2  19:00   <dbl>   <chr>   006     Chariz… <NA>    <NA>    <NA>   
#> 6     6 006   Charizard  http… <chr> 1.70 m 90.5 … Char…      NA Not …  0.0031    0.31 13:34   <dbl>   <chr>   <NA>    <NA>    <NA>    <NA>    <NA>   
#> # … with 5 more variables: next_evolution.3.name <chr>, prev_evolution.1.num <chr>, prev_evolution.1.name <chr>, prev_evolution.2.num <chr>,
#> #   prev_evolution.2.name <chr>, and abbreviated variable names ¹​candy_count, ²​spawn_chance, ³​avg_spawns, ⁴​spawn_time, ⁵​multipliers, ⁶​weaknesses,
#> #   ⁷​next_evolution.1.num, ⁸​next_evolution.1.name, ⁹​next_evolution.2.num, ˟​next_evolution.2.name, ˟​next_evolution.3.num
#> # ℹ Use `colnames()` to see all variable names

The option how = "bind" in rrapply() is less flexible as it always expands the nested list to a data.frame that is as wide as possible. On the other hand, the flexibility and interpretability in tidyr’s rectangling functions come at the cost of increased computational effort, which can become a bottleneck when unnesting large nested lists:

## large replicated pokedex list 
pokedex_large <- list(pokemon = do.call(c, replicate(1500, pokedex[["pokemon"]], simplify = FALSE)))

system.time({
  rrapply(pokedex_large, how = "bind")
})
#>    user  system elapsed 
#>   2.155   0.044   2.204

## unnest first layers prev_evolution and next_evolution
system.time({
  as_tibble(pokedex_large) |>
    unnest_wider(pokemon) |>
    unnest_wider(next_evolution, names_sep = ".") |>
    unnest_wider(prev_evolution, names_sep = ".") 
})
#>    user  system elapsed 
#> 126.633   0.320 127.060

Remark: in the chained calls to unnest_wider() above, we only unnest the first layer of the next_evolution and prev_evolution list-columns, and not any of the resulting children list-columns, which would only further increase computation time.

To extract and unnest sublists at deeper levels of nesting in the list, such as next_evolution, we manually set the coldepth parameter in the options argument, as also demonstrated above:

system.time({
  ev1 <- rrapply(
    pokedex_large, 
    condition = \(x, .xparents) "next_evolution" %in% .xparents,
    how = "bind",
    options = list(namecols = TRUE, coldepth = 5)
  )
})
#>    user  system elapsed 
#>   1.837   0.000   1.837
head(ev1)
#>        L1 L2             L3 L4 num       name
#> 1 pokemon  1 next_evolution  1 002    Ivysaur
#> 2 pokemon  1 next_evolution  2 003   Venusaur
#> 3 pokemon  2 next_evolution  1 003   Venusaur
#> 4 pokemon  4 next_evolution  1 005 Charmeleon
#> 5 pokemon  4 next_evolution  2 006  Charizard
#> 6 pokemon  5 next_evolution  1 006  Charizard

The same unnested version of the next_evolution sublists can be obtained by mixing several calls to unnest_wider() and unnest_longer():

system.time({
  ev2 <- as_tibble(pokedex_large) |>
    unnest_wider(pokemon) |>
    unnest_longer(next_evolution) |>
    unnest_wider(next_evolution, names_sep = "_") |>
    select(id, next_evolution_num, next_evolution_name)
})
#>    user  system elapsed 
#>  96.874   0.040  96.941
head(ev2)
#> # A tibble: 6 × 3
#>      id next_evolution_num next_evolution_name
#>   <int> <chr>              <chr>              
#> 1     1 002                Ivysaur            
#> 2     1 003                Venusaur           
#> 3     2 003                Venusaur           
#> 4     3 <NA>               <NA>               
#> 5     4 005                Charmeleon         
#> 6     4 006                Charizard

In the context of the current example, a more efficient approach is to combine unnest_wider() with hoist(). The disadvantage is that we need to manually specify the exact locations of the elements that we wish to hoist from the nested list:

system.time({
  ev3 <- as_tibble(pokedex_large) |>
    unnest_wider(pokemon) |>
    hoist(next_evolution, 
          name.1 = list(1, "name"),
          name.2 = list(2, "name"),
          name.3 = list(3, "name")
    ) |>
    select(id, name.1, name.2, name.3)
})
#>    user  system elapsed 
#>  42.608   0.124  42.734
head(ev3)
#> # A tibble: 6 × 4
#>      id name.1     name.2    name.3
#>   <int> <chr>      <chr>     <chr> 
#> 1     1 Ivysaur    Venusaur  <NA>  
#> 2     2 Venusaur   <NA>      <NA>  
#> 3     3 <NA>       <NA>      <NA>  
#> 4     4 Charmeleon Charizard <NA>  
#> 5     5 Charizard  <NA>      <NA>  
#> 6     6 <NA>       <NA>      <NA>

Using rrapply(), the same result can be obtained by adding a call to reshape() (or alternatively e.g. tidyr::pivot_wider() or data.table::dcast()) by converting from a long to a wide data.frame:

system.time({
  ev4 <- rrapply(
    pokedex_large, 
    condition = \(x, .xparents) "next_evolution" %in% .xparents,
    how = "bind", 
    options = list(namecols = TRUE, coldepth = 5)
  ) 
  ev5 <- reshape(
    ev4[, c("L2", "L4", "name")],
    idvar = "L2",
    timevar = "L4",
    v.names = "name",
    direction = "wide"
  )
})
#>    user  system elapsed 
#>   2.212   0.000   2.212
head(ev5)
#>   L2     name.1    name.2 name.3
#> 1  1    Ivysaur  Venusaur   <NA>
#> 3  2   Venusaur      <NA>   <NA>
#> 4  4 Charmeleon Charizard   <NA>
#> 6  5  Charizard      <NA>   <NA>
#> 7  7  Wartortle Blastoise   <NA>
#> 9  8  Blastoise      <NA>   <NA>

Additional examples

We conclude this section by replicating some of the data rectangling examples presented in the tidyr vignette: https://tidyr.tidyverse.org/articles/rectangle.html. The example nested lists are all conveniently included in the repurrrsive-package.

GitHub Users

library(repurrrsive)

## nested data
str(gh_users, list.len = 3)
#> List of 6
#>  $ :List of 30
#>   ..$ login              : chr "gaborcsardi"
#>   ..$ id                 : int 660288
#>   ..$ avatar_url         : chr "https://avatars.githubusercontent.com/u/660288?v=3"
#>   .. [list output truncated]
#>  $ :List of 30
#>   ..$ login              : chr "jennybc"
#>   ..$ id                 : int 599454
#>   ..$ avatar_url         : chr "https://avatars.githubusercontent.com/u/599454?v=3"
#>   .. [list output truncated]
#>  $ :List of 30
#>   ..$ login              : chr "jtleek"
#>   ..$ id                 : int 1571674
#>   ..$ avatar_url         : chr "https://avatars.githubusercontent.com/u/1571674?v=3"
#>   .. [list output truncated]
#>   [list output truncated]

## unnested version
rrapply(gh_users, how = "bind") |>
  as_tibble()
#> # A tibble: 6 × 30
#>   login     id avata…¹ grava…² url   html_…³ follo…⁴ follo…⁵ gists…⁶ starr…⁷ subsc…⁸ organ…⁹ repos…˟ event…˟ recei…˟ type  site_…˟ name  company blog 
#>   <chr>  <int> <chr>   <chr>   <chr> <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr> <lgl>   <chr> <list>  <chr>
#> 1 gabo… 6.60e5 https:… ""      http… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… User  FALSE   Gábo… <chr>   http…
#> 2 jenn… 5.99e5 https:… ""      http… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… User  FALSE   Jenn… <chr>   http…
#> 3 jtle… 1.57e6 https:… ""      http… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… User  FALSE   Jeff… <NULL>  http…
#> 4 juli… 1.25e7 https:… ""      http… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… User  FALSE   Juli… <NULL>  juli…
#> 5 leep… 3.51e6 https:… ""      http… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… User  FALSE   Thom… <chr>   http…
#> 6 masa… 8.36e6 https:… ""      http… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… User  FALSE   Maël… <chr>   http…
#> # … with 10 more variables: location <chr>, email <list>, hireable <list>, bio <list>, public_repos <int>, public_gists <int>, followers <int>,
#> #   following <int>, created_at <chr>, updated_at <chr>, and abbreviated variable names ¹​avatar_url, ²​gravatar_id, ³​html_url, ⁴​followers_url,
#> #   ⁵​following_url, ⁶​gists_url, ⁷​starred_url, ⁸​subscriptions_url, ⁹​organizations_url, ˟​repos_url, ˟​events_url, ˟​received_events_url, ˟​site_admin
#> # ℹ Use `colnames()` to see all variable names

GitHub repos

## nested data
str(gh_repos, list.len = 2)
#> List of 6
#>  $ :List of 30
#>   ..$ :List of 68
#>   .. ..$ id               : int 61160198
#>   .. ..$ name             : chr "after"
#>   .. .. [list output truncated]
#>   ..$ :List of 68
#>   .. ..$ id               : int 40500181
#>   .. ..$ name             : chr "argufy"
#>   .. .. [list output truncated]
#>   .. [list output truncated]
#>  $ :List of 30
#>   ..$ :List of 68
#>   .. ..$ id               : int 14756210
#>   .. ..$ name             : chr "2013-11_sfu"
#>   .. .. [list output truncated]
#>   ..$ :List of 68
#>   .. ..$ id               : int 14152301
#>   .. ..$ name             : chr "2014-01-27-miami"
#>   .. .. [list output truncated]
#>   .. [list output truncated]
#>   [list output truncated]

## unnested version
rrapply(gh_repos, how = "bind") |>
  as_tibble()
#> # A tibble: 176 × 84
#>          id name       full_…¹ owner…² owner…³ owner…⁴ owner…⁵ owner…⁶ owner…⁷ owner…⁸ owner…⁹ owner…˟ owner…˟ owner…˟ owner…˟ owner…˟ owner…˟ owner…˟
#>       <int> <chr>      <chr>   <chr>     <int> <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
#>  1 61160198 after      gaborc… gaborc…  660288 https:… ""      https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:…
#>  2 40500181 argufy     gaborc… gaborc…  660288 https:… ""      https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:…
#>  3 36442442 ask        gaborc… gaborc…  660288 https:… ""      https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:…
#>  4 34924886 baseimpor… gaborc… gaborc…  660288 https:… ""      https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:…
#>  5 61620661 citest     gaborc… gaborc…  660288 https:… ""      https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:…
#>  6 33907457 clisymbols gaborc… gaborc…  660288 https:… ""      https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:…
#>  7 37236467 cmaker     gaborc… gaborc…  660288 https:… ""      https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:…
#>  8 67959624 cmark      gaborc… gaborc…  660288 https:… ""      https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:…
#>  9 63152619 conditions gaborc… gaborc…  660288 https:… ""      https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:…
#> 10 24343686 crayon     gaborc… gaborc…  660288 https:… ""      https:… https:… https:… https:… https:… https:… https:… https:… https:… https:… https:…
#> # … with 166 more rows, 66 more variables: owner.type <chr>, owner.site_admin <lgl>, private <lgl>, html_url <chr>, description <list>, fork <lgl>,
#> #   url <chr>, forks_url <chr>, keys_url <chr>, collaborators_url <chr>, teams_url <chr>, hooks_url <chr>, issue_events_url <chr>, events_url <chr>,
#> #   assignees_url <chr>, branches_url <chr>, tags_url <chr>, blobs_url <chr>, git_tags_url <chr>, git_refs_url <chr>, trees_url <chr>,
#> #   statuses_url <chr>, languages_url <chr>, stargazers_url <chr>, contributors_url <chr>, subscribers_url <chr>, subscription_url <chr>,
#> #   commits_url <chr>, git_commits_url <chr>, comments_url <chr>, issue_comment_url <chr>, contents_url <chr>, compare_url <chr>, merges_url <chr>,
#> #   archive_url <chr>, downloads_url <chr>, issues_url <chr>, pulls_url <chr>, milestones_url <chr>, notifications_url <chr>, labels_url <chr>,
#> #   releases_url <chr>, deployments_url <chr>, created_at <chr>, updated_at <chr>, pushed_at <chr>, git_url <chr>, ssh_url <chr>, clone_url <chr>, …
#> # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

Game of Thrones characters

## nested data
str(got_chars, list.len = 3)
#> List of 30
#>  $ :List of 18
#>   ..$ url        : chr "https://www.anapioficeandfire.com/api/characters/1022"
#>   ..$ id         : int 1022
#>   ..$ name       : chr "Theon Greyjoy"
#>   .. [list output truncated]
#>  $ :List of 18
#>   ..$ url        : chr "https://www.anapioficeandfire.com/api/characters/1052"
#>   ..$ id         : int 1052
#>   ..$ name       : chr "Tyrion Lannister"
#>   .. [list output truncated]
#>  $ :List of 18
#>   ..$ url        : chr "https://www.anapioficeandfire.com/api/characters/1074"
#>   ..$ id         : int 1074
#>   ..$ name       : chr "Victarion Greyjoy"
#>   .. [list output truncated]
#>   [list output truncated]

## unnested version
rrapply(got_chars, how = "bind") |>
  as_tibble()
#> # A tibble: 30 × 18
#>    url                             id name  gender culture born  died  alive titles aliases father mother spouse alleg…¹ books povBo…² tvSer…³ playe…⁴
#>    <chr>                        <int> <chr> <chr>  <chr>   <chr> <chr> <lgl> <list> <list>  <chr>  <chr>  <chr>  <list>  <lis> <list>  <list>  <list> 
#>  1 https://www.anapioficeandfi…  1022 Theo… Male   "Ironb… "In … ""    TRUE  <chr>  <chr>   ""     ""     ""     <chr>   <chr> <chr>   <chr>   <chr>  
#>  2 https://www.anapioficeandfi…  1052 Tyri… Male   ""      "In … ""    TRUE  <chr>  <chr>   ""     ""     "http… <chr>   <chr> <chr>   <chr>   <chr>  
#>  3 https://www.anapioficeandfi…  1074 Vict… Male   "Ironb… "In … ""    TRUE  <chr>  <chr>   ""     ""     ""     <chr>   <chr> <chr>   <chr>   <chr>  
#>  4 https://www.anapioficeandfi…  1109 Will  Male   ""      ""    "In … FALSE <chr>  <chr>   ""     ""     ""     <lgl>   <chr> <chr>   <chr>   <chr>  
#>  5 https://www.anapioficeandfi…  1166 Areo… Male   "Norvo… "In … ""    TRUE  <chr>  <chr>   ""     ""     ""     <chr>   <chr> <chr>   <chr>   <chr>  
#>  6 https://www.anapioficeandfi…  1267 Chett Male   ""      "At … "In … FALSE <chr>  <chr>   ""     ""     ""     <lgl>   <chr> <chr>   <chr>   <chr>  
#>  7 https://www.anapioficeandfi…  1295 Cres… Male   ""      "In … "In … FALSE <chr>  <chr>   ""     ""     ""     <lgl>   <chr> <chr>   <chr>   <chr>  
#>  8 https://www.anapioficeandfi…   130 Aria… Female "Dorni… "In … ""    TRUE  <chr>  <chr>   ""     ""     ""     <chr>   <chr> <chr>   <chr>   <chr>  
#>  9 https://www.anapioficeandfi…  1303 Daen… Female "Valyr… "In … ""    TRUE  <chr>  <chr>   ""     ""     "http… <chr>   <chr> <chr>   <chr>   <chr>  
#> 10 https://www.anapioficeandfi…  1319 Davo… Male   "Weste… "In … ""    TRUE  <chr>  <chr>   ""     ""     "http… <chr>   <chr> <chr>   <chr>   <chr>  
#> # … with 20 more rows, and abbreviated variable names ¹​allegiances, ²​povBooks, ³​tvSeries, ⁴​playedBy
#> # ℹ Use `print(n = ...)` to see more rows

Sharla Gelfand’s discography

## nested data (first element)
str(discog[1], list.len = 3)
#> List of 1
#>  $ :List of 5
#>   ..$ instance_id      : int 354823933
#>   ..$ date_added       : chr "2019-02-16T17:48:59-08:00"
#>   ..$ basic_information:List of 11
#>   .. ..$ labels      :List of 1
#>   .. .. ..$ :List of 6
#>   .. .. .. ..$ name            : chr "Tobi Records (2)"
#>   .. .. .. ..$ entity_type     : chr "1"
#>   .. .. .. ..$ catno           : chr "TOB-013"
#>   .. .. .. .. [list output truncated]
#>   .. ..$ year        : int 2015
#>   .. ..$ master_url  : NULL
#>   .. .. [list output truncated]
#>   .. [list output truncated]

## unnested version (excluding deeply nested sublists)
discs <- rrapply(
  discog,
  condition = \(x, .xpos) length(.xpos) < 5,
  f = \(x) ifelse(is.null(x), NA, x),  ## replace NULLs
  how = "bind"
)
as_tibble(discs)
#> # A tibble: 155 × 12
#>    instance_id date_added                basic_information.year basic_information.mast…¹ basic…² basic…³ basic…⁴ basic…⁵ basic…⁶ basic…⁷     id rating
#>          <int> <chr>                                      <int> <chr>                      <int> <chr>   <chr>   <chr>   <chr>     <int>  <int>  <int>
#>  1   354823933 2019-02-16T17:48:59-08:00                   2015 <NA>                      7.50e6 "https… Demo    https:… https:…       0 7.50e6      0
#>  2   354092601 2019-02-13T14:13:11-08:00                   2013 https://api.discogs.com…  4.49e6 "https… Observ… https:… https:…  553057 4.49e6      0
#>  3   354091476 2019-02-13T14:07:23-08:00                   2017 https://api.discogs.com…  9.83e6 "https… I       https:… https:… 1109943 9.83e6      0
#>  4   351244906 2019-02-02T11:39:58-08:00                   2017 https://api.discogs.com…  9.77e6 "https… Oído A… https:… https:… 1128934 9.77e6      0
#>  5   351244801 2019-02-02T11:39:37-08:00                   2015 https://api.discogs.com…  7.24e6 "https… A Cat'… https:… https:…  857592 7.24e6      0
#>  6   351052065 2019-02-01T20:40:53-08:00                   2019 https://api.discogs.com…  1.31e7 "https… Tashme  https:… https:… 1498137 1.31e7      0
#>  7   350315345 2019-01-29T15:48:37-08:00                   2014 https://api.discogs.com…  7.11e6 "https… Demo    https:… https:…  852880 7.11e6      0
#>  8   350315103 2019-01-29T15:47:22-08:00                   2015 https://api.discogs.com…  1.05e7 "https… Let Th… https:… https:…  869410 1.05e7      0
#>  9   350314507 2019-01-29T15:44:08-08:00                   2017 https://api.discogs.com…  1.13e7 ""      Sub Sp… https:… https:… 1281224 1.13e7      0
#> 10   350314047 2019-01-29T15:41:35-08:00                   2017 <NA>                      1.17e7 "https… Demo    https:… https:…       0 1.17e7      0
#> # … with 145 more rows, and abbreviated variable names ¹​basic_information.master_url, ²​basic_information.id, ³​basic_information.thumb,
#> #   ⁴​basic_information.title, ⁵​basic_information.cover_image, ⁶​basic_information.resource_url, ⁷​basic_information.master_id
#> # ℹ Use `print(n = ...)` to see more rows

## unnest labels sublists 
labels <- rrapply(
  discog,
  condition = \(x, .xparents) "labels" %in% .xparents,
  how = "bind",
  options = list(coldepth = 5, namecols = TRUE)
)
as_tibble(labels)
#> # A tibble: 182 × 10
#>    L1    L2                L3     L4    name                                      entity_type catno   resource_url                          id entit…¹
#>    <chr> <chr>             <chr>  <chr> <chr>                                     <chr>       <chr>   <chr>                              <int> <chr>  
#>  1 1     basic_information labels 1     Tobi Records (2)                          1           TOB-013 https://api.discogs.com/labels/6… 633407 Label  
#>  2 2     basic_information labels 1     La Vida Es Un Mus                         1           Mus70   https://api.discogs.com/labels/3…  38322 Label  
#>  3 3     basic_information labels 1     La Vida Es Un Mus                         1           MUS118  https://api.discogs.com/labels/3…  38322 Label  
#>  4 4     basic_information labels 1     La Vida Es Un Mus                         1           MUS132  https://api.discogs.com/labels/3…  38322 Label  
#>  5 4     basic_information labels 2     Beat Generation                           1           BEAT64  https://api.discogs.com/labels/8…  88198 Label  
#>  6 4     basic_information labels 3     Beat Generation                           1           BEAT 64 https://api.discogs.com/labels/8…  88198 Label  
#>  7 5     basic_information labels 1     Katorga Works                             1           KW-043  https://api.discogs.com/labels/2… 205895 Label  
#>  8 6     basic_information labels 1     High Fashion Industries                   1           HFI017  https://api.discogs.com/labels/6… 637837 Label  
#>  9 7     basic_information labels 1     Mind Control Records (6)                  1           none    https://api.discogs.com/labels/7… 763103 Label  
#> 10 8     basic_information labels 1     Not On Label (Phantom Head Self-released) 1           none    https://api.discogs.com/labels/8… 879916 Label  
#> # … with 172 more rows, and abbreviated variable name ¹​entity_type_name
#> # ℹ Use `print(n = ...)` to see more rows

## merge disc id's with labels
merge(
  x = data.frame(L1 = rownames(discs), disc_id = discs[, "id"]),
  y = labels, 
  by = "L1", 
  sort = FALSE
) |>
  as_tibble()
#> # A tibble: 182 × 11
#>    L1     disc_id L2                L3     L4    name                                      entity_type catno   resource_url                 id entit…¹
#>    <chr>    <int> <chr>             <chr>  <chr> <chr>                                     <chr>       <chr>   <chr>                     <int> <chr>  
#>  1 1      7496378 basic_information labels 1     Tobi Records (2)                          1           TOB-013 https://api.discogs.com… 633407 Label  
#>  2 2      4490852 basic_information labels 1     La Vida Es Un Mus                         1           Mus70   https://api.discogs.com…  38322 Label  
#>  3 3      9827276 basic_information labels 1     La Vida Es Un Mus                         1           MUS118  https://api.discogs.com…  38322 Label  
#>  4 4      9769203 basic_information labels 1     La Vida Es Un Mus                         1           MUS132  https://api.discogs.com…  38322 Label  
#>  5 4      9769203 basic_information labels 2     Beat Generation                           1           BEAT64  https://api.discogs.com…  88198 Label  
#>  6 4      9769203 basic_information labels 3     Beat Generation                           1           BEAT 64 https://api.discogs.com…  88198 Label  
#>  7 5      7237138 basic_information labels 1     Katorga Works                             1           KW-043  https://api.discogs.com… 205895 Label  
#>  8 6     13117042 basic_information labels 1     High Fashion Industries                   1           HFI017  https://api.discogs.com… 637837 Label  
#>  9 7      7113575 basic_information labels 1     Mind Control Records (6)                  1           none    https://api.discogs.com… 763103 Label  
#> 10 8     10540713 basic_information labels 1     Not On Label (Phantom Head Self-released) 1           none    https://api.discogs.com… 879916 Label  
#> # … with 172 more rows, and abbreviated variable name ¹​entity_type_name
#> # ℹ Use `print(n = ...)` to see more rows

Data.frame to nested list

As a demonstrating example, we reconsider the long data.frame from the first section obtained after melting the renewable energy shares of all Western European countries:

renewable_energy_melt_west_eu <- rrapply(
  renewable_energy_by_country,
  condition = \(x, .xparents) "Western Europe" %in% .xparents,
  how = "melt"
) 
head(renewable_energy_melt_west_eu, n = 10)
#>      L1     L2             L3            L4 value
#> 1 World Europe Western Europe       Austria 34.67
#> 2 World Europe Western Europe       Belgium  9.14
#> 3 World Europe Western Europe        France 14.74
#> 4 World Europe Western Europe       Germany 14.17
#> 5 World Europe Western Europe Liechtenstein 62.93
#> 6 World Europe Western Europe    Luxembourg 13.54
#> 7 World Europe Western Europe        Monaco    NA
#> 8 World Europe Western Europe   Netherlands  5.78
#> 9 World Europe Western Europe   Switzerland 25.49

For certain tasks, it may be necessary to convert this data.frame back to a nested list object, e.g. to write the data to a JSON- or XML-object or for some tree visualization purpose. Writing a recursive function to reconstruct the nested list can prove to be quite time-consuming and error-prone.

In this context, the unlist() function has an inverse counterpart relist() that reconstructs a nested list from the unlisted vector. The relist() function always requires a skeleton nested list to repopulate, which can make it difficult to use in practice, as such a skeleton object is for instance unavailable for the current example. In particular, the melted data.frame contains only a subset of the original list elements, so we can not use the original list as a template object without filtering nodes from the original list as well.

Unmelt to nested list

To address this difficulty, rrapply() includes the dedicated option how = "unmelt" that performs the inverse operation of how = "melt". No skeleton object is needed in this case, only a plain data.frame in the format returned by how = "melt". To illustrate, we can convert the melted data.frame above to a nested list as follows:

rrapply(
  renewable_energy_melt_west_eu, 
  how = "unmelt"
) |>
  str(give.attr = FALSE)
#> List of 1
#>  $ World:List of 1
#>   ..$ Europe:List of 1
#>   .. ..$ Western Europe:List of 9
#>   .. .. ..$ Austria      : num 34.7
#>   .. .. ..$ Belgium      : num 9.14
#>   .. .. ..$ France       : num 14.7
#>   .. .. ..$ Germany      : num 14.2
#>   .. .. ..$ Liechtenstein: num 62.9
#>   .. .. ..$ Luxembourg   : num 13.5
#>   .. .. ..$ Monaco       : num NA
#>   .. .. ..$ Netherlands  : num 5.78
#>   .. .. ..$ Switzerland  : num 25.5

Remark 1: how = "unmelt" is based on a greedy approach parsing data.frame rows as list elements starting from the top of the data.frame. That is, rrapply() continues collecting children nodes as long as the parent node name remains unchanged. If, for instance, the goal is to create two separate nodes (on the same level) with the name "Western Europe", these nodes should not be listed directly after one another in the melted data.frame as rrapply() will group all children under a single "Western Europe" list element.

Remark 2: Internally, how = "unmelt" reconstructs a nested list from the melted data.frame and subsequently follows the same conceptual framework as how = "replace". Any other function arguments, such as f and condition can be used in exactly the same way as when applying how = "replace" to a nested list object.

Remark 3: how = "unmelt" does (currently) not restore the attributes of intermediate list nodes and is therefore not an exact inverse of how = "melt". The other way around always produces the same result:

renewable_energy_unmelt <- rrapply(renewable_energy_melt_west_eu, how = "unmelt")
renewable_energy_remelt <- rrapply(renewable_energy_unmelt, how = "melt")

identical(renewable_energy_melt_west_eu, renewable_energy_remelt)
#> [1] TRUE

In terms of computational effort, rrapply()’s how = "unmelt" can be equally or more efficient than relist() even though there is no template list object that can be repopulated. This is highlighted using the large list objects generated previously:

## large deeply nested list (2^18 elements)
##  benchmark timing with rrapply
system.time(deep_unmelt <- rrapply(deep_melt, how = "unmelt")) 
#>    user  system elapsed 
#>   0.142   0.000   0.143

## benchmark timing with relist
deep_unlist <- unlist(as.relistable(deep_list))
system.time(deep_relist <- relist(deep_unlist))
#>    user  system elapsed 
#>   2.978   0.000   2.978
## large shallow list (10^6 elements)
## benchmark timing with rrapply 
system.time(shallow_unmelt <- rrapply(shallow_melt, how = "unmelt")) 
#>    user  system elapsed 
#>   0.084   0.000   0.085

## benchmark timing with relist
shallow_unlist <- unlist(as.relistable(shallow_list))
system.time(shallow_relist <- relist(shallow_unlist))
#>    user  system elapsed 
#>   5.121   0.000   5.122

Note: the unmelted lists are not exactly identical to the original nested lists, since how = "unmelt" uses the placeholder names 1, 2, 3, etc. in the melted data.frames to name the nodes in the newly constructed lists, whereas the name attributes in the original lists are all empty. By removing all names from the unmelted lists, they become identical to their original counterparts:

## remove all list names
deep_unmelt_unnamed <- rrapply(
  deep_unmelt,
  f = unname,
  classes = "list",
  how = "recurse"
)
## check if identical
identical(unname(deep_unmelt_unnamed), deep_list)
#> [1] TRUE
## remove all list names
shallow_unmelt_unnamed <- rrapply(
  shallow_unmelt,
  f = unname,
  classes = "list",
  how = "recurse"
)
## check if identical
identical(unname(shallow_unmelt_unnamed), shallow_list)
#> [1] TRUE

References

The latest stable version of the rrapply-package is available on CRAN. Additional details and examples on how to use the rrapply() function can be found at https://jorischau.github.io/rrapply/ and a quick reference sheet can be downloaded from the github repository at https://github.com/JorisChau/rrapply/.

Session Info

sessionInfo()
#> R version 4.2.1 (2022-06-23)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
#>  [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] repurrrsive_1.0.0 tibble_3.1.7      tidyr_1.2.0       dplyr_1.0.9       rrapply_1.2.5    
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.9        bslib_0.3.1       compiler_4.2.1    pillar_1.8.0      jquerylib_0.1.4   plyr_1.8.7        tools_4.2.1       digest_0.6.29    
#>  [9] jsonlite_1.8.0    evaluate_0.15     lifecycle_1.0.1   pkgconfig_2.0.3   rlang_1.0.4       bench_1.1.2       cli_3.3.0         rstudioapi_0.13  
#> [17] yaml_2.3.5        blogdown_1.10     xfun_0.31         fastmap_1.1.0     stringr_1.4.0     knitr_1.39        generics_0.1.3    sass_0.4.1       
#> [25] vctrs_0.4.1       tidyselect_1.1.2  data.table_1.14.2 glue_1.6.2        R6_2.5.1          fansi_1.0.3       profmem_0.6.0     rmarkdown_2.14   
#> [33] bookdown_0.27     purrr_0.3.4       reshape2_1.4.4    magrittr_2.0.3    htmltools_0.5.2   ellipsis_0.3.2    utf8_1.2.2        stringi_1.7.8

  1. The renewable_energy_by_country dataset is publicly available at the United Nations Open SDG Data Hub↩︎

  2. Note that rrapply() imposes a different column order than reshape2::melt() and the "value" column may follow slightly different coercion rules, but other than that the melted data.frames are the same.↩︎

To leave a comment for the author, please follow the link and comment on their blog: R-bloggers | A Random Walk.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)