0

I have a json string in this format:

{"key1":
 {"key1.1":val, "key1.2":val, ...},
"key2":
 {"key2.1":val, "key2.2":val, ...}}

I am looking for a way to query this string and extract every base key-value pair as a column. So the expected result would look something like:

Row  key1.1   key1.2  ....  key2.1   key2.2  ....
1    val      val           val      val   
2    val      val           val      val

One method would be to use JSON_VALUE for each key, but I am trying to find a more efficient method that can be applied to any JSON string. I have thought about using a LOOP method, but not sure how to iterate over each key-value pair in a json string in bigquery.

Ken White
  • 123,280
  • 14
  • 225
  • 444
Roark
  • 69
  • 6
  • 2
    bigquery is not mysql, please don't tag both – ysth May 10 '22 at 02:05
  • https://stackoverflow.com/questions/34890339/how-to-extract-all-the-keys-in-a-json-object-with-bigquery looks helpful? – ysth May 10 '22 at 02:07
  • That gives me the keys required, but it doesn't help to put each one as its own column, along with the rows being values. – Roark May 10 '22 at 03:12
  • 1
    oh, each one in its own column? you will need "dynamic sql" for this; from https://towardsdatascience.com/how-to-use-dynamic-sql-in-bigquery-8c04dcc0f0de it looks like in bigquery you can use "EXECUTE IMMEDIATE" for that – ysth May 10 '22 at 14:36

1 Answers1

2

Consider below

create temp function  extract_keys(input string) returns array<string> language js as """
  return Object.keys(JSON.parse(input));
  """;
create temp function  extract_values(input string) returns array<string> language js as """
  return Object.values(JSON.parse(input));
  """;
create temp function extract_all_leaves(input string) returns string language js as '''
  function flattenObj(obj, parent = '', res = {}){
    for(let key in obj){
        let propName = parent ? parent + '.' + key : key;
        if(typeof obj[key] == 'object'){
            flattenObj(obj[key], propName, res);
        } else {
            res[propName] = obj[key];
        }
    }
    return JSON.stringify(res);
  }
  return flattenObj(JSON.parse(input));
''';
create temp table flatten_table as 
  select json, offset, key, value
  from your_table, 
  unnest([struct(extract_all_leaves(json) as kv)]),
  unnest(extract_keys(kv)) as key with offset
  join unnest(extract_values(kv)) as value with offset
  using(offset);

execute immediate (select '''
  select * except(json) from (select * except(offset) from flatten_table)
  pivot (any_value(value) for replace(key, '.', '_') in (''' || keys_list || '''
  ))'''
from (select string_agg('"' || replace(key, '.', '_') || '"', ',' order by offset) keys_list from (
  select key, min(offset) as offset from flatten_table group by key
))
);            

if applied to sample data as in your question

create temp table your_table as (
  select '''{
    "key1": {"key1.1":"val11", "key1.2":"val12"}, 
    "key2": {"key2.1":"val21", "key2.2":"val22"}
  }''' json
);            

output is

enter image description here

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • Thanks Mikhail, this solution works however on GCP, it doesn't allow me to save this as a BigQuery view as it contains temp functions that are not supported inside views. I didn't specify this in the question so that's my fault. I wonder if there is a solution without using create temp function. – Roark May 10 '22 at 06:51
  • 1
    You can create persistent functions instead of temp functions and call them in your view like any other built-in function. – Jaytiger May 10 '22 at 10:07