load bigquery JSON data dump into R tibble

Question

I have downloaded a JSON extract from Big Query which has nested and repeated fields (similar to the package bigrquery) and am attempting to further manipulate the resulting tibble.

I have the following code to load from JSON and convert to a tibble

library(tidyverse)
ga.list <- lapply(readLines("temp.json"), jsonlite::fromJSON, flatten = TRUE)
ga.df <- tibble(dat = ga.list) %>% 
    unnest_wider(dat) %>% 
    mutate(id = row_number()) %>%
         unnest_wider(b_nested) %>%
         unnest_wider(b3) %>% 
         unnest_wider(b33)

So there were two list columns:

b_nested, this column is a nested list (which I unnested recursively .. maybe there is a more automated way, if so, please advise!)
rr1 and rr2, these columns will always have the same number of elements. So elements 1 of rr1 and rr2 should be read together.

I am still working out how to extract id, rr1 and rr2 and make into a long table with repeated rows for each id row.

Note: this question has been edited a few times as I progress further along .. originally I had got stuck getting it from JSON to a tibble until I found unnest_wider()

temp.json:

{"a":"4000","b_nested":{"b1":"(not set)","b2":"some - text","b3":{"b31":"1591558980","b32":"60259425255","b33":{"b3311":"133997175"},"b4":false},"b5":true},"rr1":[],"rr2":[]} {"a":"4000","b_nested":{"b1":"asdfasdfa","b2":"some - text more","b3":{"b31":"11111","b32":"2222","b33":{"b3311":"3333333"},"b4":true},"b5":true}, "rr1":["v1","v2","v3"],"rr2":["x1","x2","x3"]} {"a":"6000","b_nested":{"b1":"asdfasdfa","b2":"some - text more","b3":{"b31":"11111","b32":"2222","b33":{"b3311":"3333333"},"b4":true},"b5":true},"rr1":["v1","v2","v3","v4","v5"],"rr2":["aja1","aja2","aja3","aja14","aja5"]}

I would like to see a sample row from your JSON output as you stated in the question. — Sabri Karagönen, Apr 14 '20 at 10:05

user1420372 · Accepted Answer · 2020-04-14T12:28:57.047

The final piece of the puzzle; in order to get the repeating rows for repeating record

  ga.df %>% select(id, rr1, rr2)  %>%
  unnest(cols = c(rr1, rr2))

FYI: Link to Big Query Specifying nested and repeated columns

Another solution (my preference) would be to create a tibble from rr1 and rr1 and keep as a column in ga.df so that purrr functions can be used

    ga.df %>% 
  mutate(rr = map2(rr1, rr2, function(x,y) {
    tibble(rr1 = x, rr2 = y)
  })) %>%
  select(-rr1, -rr2) %>%
  mutate(rr_length = map_int(rr, ~nrow(.x)))

load bigquery JSON data dump into R tibble

1 Answers1