2

I have to deal with JSON documents that contain nested documents and at some level have an array which in turn contains individual documents that conceptionally would map back to a "data frame row" when reading/parsing the JSON in R.

How can I ensure that all data frames are casted into tibbles when retrieving data from the database?

Desired result for example data below

Desired result

query_res$levelOne <- query_res$levelOne %>% tibble::as_tibble()
query_res$levelOne$levelTwo <- query_res$levelOne$levelTwo %>% 
  tibble::as_tibble()
query_res$levelOne$levelTwo$levelThree <- query_res$levelOne$levelTwo$levelThree %>% 
  purrr::map(tibble::as_tibble)

query_res %>% str()
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of  3 variables:
#  $ labels  :List of 2
#   ..$ : chr  "label-a" "label-b"
#   ..$ : chr  "label-a" "label-b"
#  $ levelOne:Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of  1 variable:
#   ..$ levelTwo:Classes ‘tbl_df’, ‘tbl’ and 'data.frame':  2 obs. of  1 variable:
#   .. ..$ levelThree:List of 2
#   .. .. ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    2 obs. of  3 variables:
#   .. .. .. ..$ x: chr  "A" "B"
#   .. .. .. ..$ y: int  1 2
#   .. .. .. ..$ z: logi  TRUE FALSE
#   .. .. ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    2 obs. of  3 variables:
#   .. .. .. ..$ x: chr  "A" "B"
#   .. .. .. ..$ y: int  10 20
#   .. .. .. ..$ z: logi  FALSE TRUE
#  $ schema  : chr  "0.0.1" "0.0.1"

If I try to do that via dplyr::mutate() or purrr::map*_df(), I get the Error: Columnis of unsupported class data.frame error.

Related post

Recursively ensuring tibbles instead of data frames when parsing/manipulating nested JSON


Example

JSON data to put into file dump.json

{"labels": ["label-a", "label-b"],"levelOne": {"levelTwo": {"levelThree": [{"x": "A","y": 1,"z": true},{"x": "B","y": 2,"z": false}]}},"schema": "0.0.1"}
{"labels": ["label-a", "label-b"],"levelOne": {"levelTwo": {"levelThree": [{"x": "A","y": 10,"z": false},{"x": "B","y": 20,"z": true}]}},"schema": "0.0.1"}

Importing JSON into MongoDB

con <- mongolite::mongo(
  db = "stackoverflow",
  collection = "nested_json"
)

con$import(file("dump.json"))

This is what you should see within MongoDB

enter image description here

Query via $find()

query_res <- con$find() %>%
  tibble::as_tibble()

query_res %>% str()
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of  3 variables:
#  $ labels  :List of 2
#   ..$ : chr  "label-a" "label-b"
#   ..$ : chr  "label-a" "label-b"
#  $ levelOne:'data.frame': 2 obs. of  1 variable:
#   ..$ levelTwo:'data.frame':  2 obs. of  1 variable:
#   .. ..$ levelThree:List of 2
#   .. .. ..$ :'data.frame':    2 obs. of  3 variables:
#   .. .. .. ..$ x: chr  "A" "B"
#   .. .. .. ..$ y: int  1 2
#   .. .. .. ..$ z: logi  TRUE FALSE
#   .. .. ..$ :'data.frame':    2 obs. of  3 variables:
#   .. .. .. ..$ x: chr  "A" "B"
#   .. .. .. ..$ y: int  10 20
#   .. .. .. ..$ z: logi  FALSE TRUE
#  $ schema  : chr  "0.0.1" "0.0.1"

Query via $iterate()

it <- con$iterate()

iter_res <- list()
while(!is.null(x <- it$one())) {
  # Ensure array columns stay individual list columns when casting to tibble:
  # (As opposed to multiple array items being turned into one tibble row)
  p <- function(x) {
    is.list(x) &&
      is.null(names(x))
  }
  f <- function(x) {
    list(x %>% unlist())
  }
  x <- x %>% purrr::map_if(p, f)

  # Necessary to get the `simplifyVector = TRUE` effect:
  iter_res_current <- x %>%
    jsonlite:::simplify() %>%
    tibble::as_tibble()

  # Combine with previous iteration results:
  iter_res <- c(iter_res, list(iter_res_current))
}
iter_res_df <- iter_res %>%
  dplyr::bind_rows()

iter_res_df %>% str()
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of  3 variables:
#  $ labels  :List of 2
#   ..$ : chr  "label-a" "label-b"
#   ..$ : chr  "label-a" "label-b"
#  $ levelOne:List of 2
#   ..$ :List of 1
#   .. ..$ levelThree:'data.frame': 2 obs. of  3 variables:
#   .. .. ..$ x: chr  "A" "B"
#   .. .. ..$ y: int  1 2
#   .. .. ..$ z: logi  TRUE FALSE
#   ..$ :List of 1
#   .. ..$ levelThree:'data.frame': 2 obs. of  3 variables:
#   .. .. ..$ x: chr  "A" "B"
#   .. .. ..$ y: int  10 20
#   .. .. ..$ z: logi  FALSE TRUE
#  $ schema  : chr  "0.0.1" "0.0.1"
Rappster
  • 12,762
  • 7
  • 71
  • 120
  • to clarify, are you wanting each nested data.frame to be cast to a tibble, whilst retaining the nested structure of the result? – SymbolixAU Jan 14 '20 at 04:27
  • @SymbolixAU yes, that's exactly what I want, sorry if that didn't come across (I was lost in nested structures and recursive programming approaches yesterday). I'd be okay with the general rule of "always tibble instead of data frame", I don't necessarily need to drop certain nesting levels - as (accidentally) seemed to have happened to `levelTwo` in my `$iterat()` approach – Rappster Jan 14 '20 at 08:52
  • @SymbolixAU: I've added the desired structure right below the question – Rappster Jan 14 '20 at 09:26

0 Answers0