1

I'm writing some queries with Google BigQuery. I want to extract the JSON from an array. Somehow I can extract it. And @Mikhail Berlyant helped me here. But now the problem is in extracting JSON from the array without Duplicates.

Current Structure:

enter image description here

I what I tried:

WITH
  cte AS (
  SELECT
    labels,
    cost
  FROM
    BILLING.gcp_billing_export_v1)
SELECT
  la,
  cost
FROM
  cte,
  UNNEST(labels) AS la

enter image description here

See the cost box, the COST value is repeated twice, because we have 2 KEY, VALUE pairs in the array.

So while doing sum(cost) with the group by la.key then I'm getting the wrong value.

What Im looking for is,

enter image description here

Can anyone help me with this?

TheDataGuy
  • 2,712
  • 6
  • 37
  • 89

1 Answers1

3

Below is for BigQuery Standard SQL

#standardSQL
SELECT 
  description, 
  ARRAY(
    SELECT AS STRUCT 
      JSON_EXTRACT_SCALAR(kv, '$.key') key, 
      JSON_EXTRACT_SCALAR(kv, '$.value') value 
    FROM UNNEST(SPLIT(labels, '},{')) kv_temp, 
    UNNEST([CONCAT('{', REGEXP_REPLACE(kv_temp, r'^\[{|}]$', ''), '}')]) kv
  ) labels,
  cost
FROM `project.dataset.table`   

You can test, play with above using excerpt of dummy data from your question as below

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'aaa' description, '[{"key":"application","value":"scaled-server"},{"key":"department","value":"hrd"}]' labels, 0.323316 cost UNION ALL
  SELECT 'bbb' description, '[{"key":"application2","value":"scaled-server2"},{"key":"department2","value":"hrd2"}]' labels, 0.342825 cost 
)
SELECT 
  description, 
  ARRAY(
    SELECT AS STRUCT 
      JSON_EXTRACT_SCALAR(kv, '$.key') key, 
      JSON_EXTRACT_SCALAR(kv, '$.value') value 
    FROM UNNEST(SPLIT(labels, '},{')) kv_temp, 
    UNNEST([CONCAT('{', REGEXP_REPLACE(kv_temp, r'^\[{|}]$', ''), '}')]) kv
  ) labels,
  cost
FROM `project.dataset.table`   

with result

Row description labels.key      labels.value    cost     
1   aaa         application     scaled-server   0.323316     
                department      hrd      
2   bbb         application2    scaled-server2  0.342825     
                department2     hrd2         
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • Thanks for your amazing queries, it works, but while apply the same query on my actual dataset im getting this error `No matching signature for function SPLIT for argument types: ARRAY>, STRING. Supported signatures: SPLIT(STRING, [STRING]); SPLIT(BYTES, BYTES) at [20:17]` . Line 20 is ` FROM UNNEST(SPLIT(labels, '},{')) kv_temp, ` – TheDataGuy Nov 19 '18 at 18:49
  • in your question you provided example of data - see `Current Structure` section. so my answer is based on what you provided in the question – Mikhail Berlyant Nov 19 '18 at 18:58
  • I used `TO_JSON_STRING(labels)` to get that column. Is that fine? – TheDataGuy Nov 19 '18 at 19:00
  • I changed it to `FROM UNNEST(SPLIT(TO_JSON_STRING(labels), '},{')) kv_temp` . now its working – TheDataGuy Nov 19 '18 at 19:05
  • Hey @Mikhail, If I want to do group by 1,2 3 for sum (c0st) how this works? – TheDataGuy Nov 19 '18 at 20:06
  • you should post new question and provide all details of what you need including input data and expected output - this will allow us to answer your new question – Mikhail Berlyant Nov 19 '18 at 20:09
  • https://stackoverflow.com/questions/53382005/bigquery-group-by-on-arrays – TheDataGuy Nov 19 '18 at 20:16
  • Thanks! this save my time. – Md Sirajus Salayhin Jul 11 '19 at 09:09