0

I currently have the below query which perfectly works, but I would like to know if it can be optimized (perhaps avoid to UNNEST firstly and GROUP BY secondly and make transformations in one step).

with src as (
    select 1 as row_key, "key_A:value_A,key_B:value_B,key_C:value_C" as field_raw
), tmp as (
    select
        row_key
        , STRUCT(
            split(field_items, ':')[offset(0)] as key
            , split(field_items, ':')[offset(1)] as value
        ) AS field_items
    from src
    , unnest(split(field_raw, ',')) field_items
)
select
    row_key
    , ARRAY_AGG(field_items) as field_items
from tmp
group by row_key

Input :

row_key field_raw
1 key_A:value_A,key_B:value_B,key_C:value_C

Expected output :

row_key field_items.key field_items.value
1 key_A value_A
key_B value_B
key_C value_C

Thanks for help :)

JohnDu17
  • 37
  • 1
  • 6

1 Answers1

2

Consider below refactored approach

select row_key, 
  array(select as struct
      split(kv, ':')[offset(0)] as key, 
      split(kv, ':')[offset(1)] as value
    from t.arr as kv 
  ) as field_items
from src, 
unnest([struct(regexp_extract_all(field_raw, r'\w+:\w+') as arr)]) t    

if applied to sample data in your question - output is

enter image description here

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230