Site icon R-bloggers

JSON, NULL values and as_tibble

[This article was first published on rstats-tips.net, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

When working with data provided by common APIs you will almost always get in contact with JSON formatted data. Using R’s rjson::fromJSON will transform JSON to R’s lists. So far so good. Converting those lists to tibble using tibble::as_tibble will fail when the JSON (and therefor the list) contains NULL values. So you havve to replace them before building the tibble.

Example

But let’s start with an example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
library(tidyverse)
library(rjson)

data_json <- '
[
  {
    "name": "Tim",
    "age": 34,
    "hobby": "footbal"
  },
  {
    "name": "Tom",
    "age": 22,
    "hobby": "baseball"
  },    
  {
    "name": "Shelly",
    "age": 21,
    "hobby": "cycling"
  }  
]
'
parsed_json <- rjson::fromJSON(data_json)

parsed_json %>% str()
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
## List of 3
##  $ :List of 3
##   ..$ name : chr "Tim"
##   ..$ age  : num 34
##   ..$ hobby: chr "footbal"
##  $ :List of 3
##   ..$ name : chr "Tom"
##   ..$ age  : num 22
##   ..$ hobby: chr "baseball"
##  $ :List of 3
##   ..$ name : chr "Shelly"
##   ..$ age  : num 21
##   ..$ hobby: chr "cycling"

In in this example we parse the JSON using rjson::fromJSON() into a list of lists. Each sublist contains the value of a person.

We can convert this list of lists into a tibble with some purrr::map_dfr() magic:

1
2
parsed_json %>% 
  purrr::map_dfr(~ tibble::as_tibble(.x))
1
2
3
4
5
6
## # A tibble: 3 × 3
##   name     age hobby   
##   <chr>  <dbl> <chr>   
## 1 Tim       34 footbal 
## 2 Tom       22 baseball
## 3 Shelly    21 cycling

Example with null-values

But what happens if there are null-values in the JSON file?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
data_json <- '
[
  {
    "name": "Tim",
    "age": 34,
    "hobby": "footbal"
  },
  {
    "name": "Tom",
    "age": null,
    "hobby": "baseball"
  },    
  {
    "name": "Shelly",
    "age": 21,
    "hobby": null
  }  
]
'
parsed_json <- rjson::fromJSON(data_json)

parsed_json %>% str()
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
## List of 3
##  $ :List of 3
##   ..$ name : chr "Tim"
##   ..$ age  : num 34
##   ..$ hobby: chr "footbal"
##  $ :List of 3
##   ..$ name : chr "Tom"
##   ..$ age  : NULL
##   ..$ hobby: chr "baseball"
##  $ :List of 3
##   ..$ name : chr "Shelly"
##   ..$ age  : num 21
##   ..$ hobby: NULL

As you can see the age of the second entry and the hobby of the third one are null. They get parsed to NULL.

These lead to an error when converting the list of list to a tibble:

1
2
parsed_json %>% 
  purrr::map_dfr(~ tibble::as_tibble(.x))
1
2
3
4
5
## Error in `map()`:
## ℹ In index: 2.
## Caused by error in `tibble::as_tibble()`:
## ! All columns in a tibble must be vectors.
## ✖ Column `age` is NULL.

Solution

So I suggest to use a custom function which converts NULL values to NAs and then converts the entry to a tibble

1
2
3
4
5
6
7
8
fix_NULL <- function(variables) {
  variables <- variables %>%
    map(~ ifelse(is.null(.x), NA, .x))
  tibble::as_tibble(variables)
}

parsed_json %>% 
  purrr::map_dfr(~ fix_NULL(.x))
1
2
3
4
5
6
## # A tibble: 3 × 3
##   name     age hobby   
##   <chr>  <dbl> <chr>   
## 1 Tim       34 footbal 
## 2 Tom       NA baseball
## 3 Shelly    21 <NA>
To leave a comment for the author, please follow the link and comment on their blog: rstats-tips.net.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version