JSON, NULL values and as_tibble
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
When working with data provided by common APIs you will almost always get in contact with JSON formatted data. Using R’s rjson::fromJSON
will transform JSON to R’s lists.
So far so good. Converting those lists to tibble
using tibble::as_tibble
will fail when the JSON (and therefor the list) contains NULL values. So you havve to replace them before building the tibble.
Example
But let’s start with an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
library(tidyverse) library(rjson) data_json <- ' [ { "name": "Tim", "age": 34, "hobby": "footbal" }, { "name": "Tom", "age": 22, "hobby": "baseball" }, { "name": "Shelly", "age": 21, "hobby": "cycling" } ] ' parsed_json <- rjson::fromJSON(data_json) parsed_json %>% str() |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
## List of 3 ## $ :List of 3 ## ..$ name : chr "Tim" ## ..$ age : num 34 ## ..$ hobby: chr "footbal" ## $ :List of 3 ## ..$ name : chr "Tom" ## ..$ age : num 22 ## ..$ hobby: chr "baseball" ## $ :List of 3 ## ..$ name : chr "Shelly" ## ..$ age : num 21 ## ..$ hobby: chr "cycling" |
In in this example we parse the JSON using rjson::fromJSON()
into a list of lists. Each sublist contains the value of a person.
We can convert this list of lists into a tibble
with some purrr::map_dfr()
magic:
1 2 |
parsed_json %>% purrr::map_dfr(~ tibble::as_tibble(.x)) |
1 2 3 4 5 6 |
## # A tibble: 3 × 3 ## name age hobby ## <chr> <dbl> <chr> ## 1 Tim 34 footbal ## 2 Tom 22 baseball ## 3 Shelly 21 cycling |
Example with null-values
But what happens if there are null-values in the JSON file?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
data_json <- ' [ { "name": "Tim", "age": 34, "hobby": "footbal" }, { "name": "Tom", "age": null, "hobby": "baseball" }, { "name": "Shelly", "age": 21, "hobby": null } ] ' parsed_json <- rjson::fromJSON(data_json) parsed_json %>% str() |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
## List of 3 ## $ :List of 3 ## ..$ name : chr "Tim" ## ..$ age : num 34 ## ..$ hobby: chr "footbal" ## $ :List of 3 ## ..$ name : chr "Tom" ## ..$ age : NULL ## ..$ hobby: chr "baseball" ## $ :List of 3 ## ..$ name : chr "Shelly" ## ..$ age : num 21 ## ..$ hobby: NULL |
As you can see the age
of the second entry and the hobby of the third one are null
. They get parsed to NULL
.
These lead to an error when converting the list of list to a tibble:
1 2 |
parsed_json %>% purrr::map_dfr(~ tibble::as_tibble(.x)) |
1 2 3 4 5 |
## Error in `map()`: ## ℹ In index: 2. ## Caused by error in `tibble::as_tibble()`: ## ! All columns in a tibble must be vectors. ## ✖ Column `age` is NULL. |
Solution
So I suggest to use a custom function which converts NULL
values to NAs and then converts the entry to a tibble
1 2 3 4 5 6 7 8 |
fix_NULL <- function(variables) { variables <- variables %>% map(~ ifelse(is.null(.x), NA, .x)) tibble::as_tibble(variables) } parsed_json %>% purrr::map_dfr(~ fix_NULL(.x)) |
1 2 3 4 5 6 |
## # A tibble: 3 × 3 ## name age hobby ## <chr> <dbl> <chr> ## 1 Tim 34 footbal ## 2 Tom NA baseball ## 3 Shelly 21 <NA> |
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.