Lists to Data.Frames with imap
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
When working with data which is a result of json-data converted to a list of lists of lists of lists … (you know what mean ;-)) I often want to convert it a data.frame.
Unfortunately there’s often a list in the source data which is unnamed.
Or the list in one row is longer than the one in another row. So converting it
straight forward into a data.frame or tibble fails with the error message
Tibble columns must have compatible sizes.
So what to do? Just leave lists as values in the cells of the data.frame.
Let’s have a look at some sample data:
Sample data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
options(tidyverse.quiet = TRUE) library(tidyverse) row_1 <- list( a = 42, b = list("one", "two", "three", "four"), c = list("R", "python") ) row_2 <- list( a = 3.14159, b = list("A", "B"), c = list("Montana", "Ohio", "California") ) source <- list(row_1, row_2) |
So we have a list source
which contains two entries. Both are lists on
its own: row_1
and row_2
.
Goal
As a result we want to get a data.frame (or tibble
):
1 2 3 4 5 6 7 |
target <- tribble( ~a, ~b, ~c, 42, list("one", "two", "three", "four"), list("R", "python"), 3.14159, list("A", "B"), list("Montana", "Ohio", "California") ) target |
1 2 3 4 5 |
## # A tibble: 2 × 3 ## a b c ## <dbl> <list> <list> ## 1 42 <list [4]> <list [2]> ## 2 3.14 <list [2]> <list [3]> |
purrr::imap
Let’s start with a single row.
The idea is to iterate over each element of the the row_1
. So purrr::map*
seems
to be the function-family of choice. But these functions iterate only over the
values of the list. They don’t pass the name of each element.
So we need purrr::imap
. This function takes two arguments, the value and the name,
and puts them into the processing function:
1 2 |
row_1 %>% purrr::imap_dfc(~ tibble({{.y}} := list(.x))) |
1 2 3 4 |
## # A tibble: 1 × 3 ## a b c ## <list> <list> <list> ## 1 <dbl [1]> <list [4]> <list [2]> |
Okay, that seems pretty good. But the first column shouldn’t be a list. Here we want a normal column.
1 2 |
row_1 %>% purrr::imap_dfc(~ tibble({{.y}} := ifelse(length(.x) > 1, list(.x), .x))) |
1 2 3 4 |
## # A tibble: 1 × 3 ## a b c ## <dbl> <list> <list> ## 1 42 <list [4]> <list [2]> |
That’s really nice. So how do we process the whole list source
?
We use another instance of purrr::map*
.
1 2 3 4 5 |
result <- source %>% purrr::map_dfr( ~.x %>% purrr::imap_dfc(~ tibble({{.y}} := ifelse(length(.x) > 1, list(.x), .x))) ) result |
1 2 3 4 5 |
## # A tibble: 2 × 3 ## a b c ## <dbl> <list> <list> ## 1 42 <list [4]> <list [2]> ## 2 3.14 <list [2]> <list [3]> |
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.