BigQuery query optimisation - Unnest field to REPEATED STRUCT

Question

I currently have the below query which perfectly works, but I would like to know if it can be optimized (perhaps avoid to UNNEST firstly and GROUP BY secondly and make transformations in one step).

with src as (
    select 1 as row_key, "key_A:value_A,key_B:value_B,key_C:value_C" as field_raw
), tmp as (
    select
        row_key
        , STRUCT(
            split(field_items, ':')[offset(0)] as key
            , split(field_items, ':')[offset(1)] as value
        ) AS field_items
    from src
    , unnest(split(field_raw, ',')) field_items
)
select
    row_key
    , ARRAY_AGG(field_items) as field_items
from tmp
group by row_key

Input :

row_key	field_raw
1	key_A:value_A,key_B:value_B,key_C:value_C

Expected output :

row_key	field_items.key	field_items.value
1	key_A	value_A
	key_B	value_B
	key_C	value_C

Thanks for help :)

score 2 · Answer 1 · answered Dec 08 '21 at 17:50

Consider below refactored approach

select row_key, 
  array(select as struct
      split(kv, ':')[offset(0)] as key, 
      split(kv, ':')[offset(1)] as value
    from t.arr as kv 
  ) as field_items
from src, 
unnest([struct(regexp_extract_all(field_raw, r'\w+:\w+') as arr)]) t

if applied to sample data in your question - output is

BigQuery query optimisation - Unnest field to REPEATED STRUCT

1 Answers1