0

Im working on GCP Billing queries in BQ. But while extracting array with the cost I'm getting wrong values like unnest returns array elements in row format. So if I have 2 elements in an array for a single row then I'll get 2 rows.

EG:

Actual Array:

SELECT

TO_JSON_STRING(labels), cost

FROM

billing_export.gcp_billing_export

WHERE

_PARTITIONTIME >= "2018-08-01 00:00:00"

AND _PARTITIONTIME < "2018-09-01 00:00:00"

AND billing_account_id = "xxx-62378F-xxx"

AND TO_JSON_STRING(labels) = '[{"key":"application","value":"scaled-server"},{"key":"department","value":"hrd"}]'

and cost> 0 limit 10

enter image description here

with Unnest:

with cte as (SELECT

labels, cost

FROM

billing_export.gcp_billing_export

WHERE

_PARTITIONTIME >= "2018-08-01 00:00:00"

AND _PARTITIONTIME < "2018-09-01 00:00:00"

AND billing_account_id = "xxx-62378F-xxxx"

AND TO_JSON_STRING(labels) = '[{"key":"application","value":"scaled-server"},{"key":"department","value":"hrd"}]'

and cost> 0

limit 10 )

select labels,cost from cte ,

UNNEST(labels) AS la

enter image description here

Question:

I don't want duplicate cost value, Can anyone help me with this query?

Community
  • 1
  • 1
TheDataGuy
  • 2,712
  • 6
  • 37
  • 89

1 Answers1

3

instead of

SELECT labels,cost from cte ,
UNNEST(labels) AS la   

try

SELECT la, cost from cte ,
UNNEST(labels) AS la   

Update

SELECT 
  ARRAY(
    SELECT AS STRUCT 
      JSON_EXTRACT_SCALAR(kv, '$.key') key, 
      JSON_EXTRACT_SCALAR(kv, '$.value') value 
    FROM UNNEST(SPLIT(labels, '},{')) kv_temp, 
    UNNEST([CONCAT('{', REGEXP_REPLACE(kv_temp, r'^\[{|}]$', ''), '}')]) kv
  ) labels,
  cost
FROM cte
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • This query shows, again duplicates. Previous result: https://imgur.com/tA6vCi3 . using the above query: https://imgur.com/N7e8Hr4 . The cost field is duplicate on both times. – TheDataGuy Nov 19 '18 at 06:16
  • i hope, you at least recognized that above answer points you on one of the issue you have of using `labels` instead of `la` - if this is a case - consider voting up. meantime - can you explain the logic of where you expect to see cost and where not and why - so we can help you in expressing that logic via sql – Mikhail Berlyant Nov 19 '18 at 17:09
  • yeah upvoted. I explained the complete requirement here: https://www.reddit.com/r/bigquery/comments/9yd8w2/unnest_an_array/ or I have another question here: https://stackoverflow.com/questions/53372423/sql-extract-json-from-array – TheDataGuy Nov 19 '18 at 17:29
  • unfortunately it is still not clear. mostly it looks like you have already had your labels as an repeated field and then you are trying to "reverse' this back to array. what the reason? you should elaborate in more details on your use case - otherwise it will be just guessing on our side – Mikhail Berlyant Nov 19 '18 at 17:55
  • I want to get the billing for the resources based on Labels So, If I have 2 different labels, then while Unnesting BQ extract 2 labels in row 2rows and add the cost for that resource for those 2 rows eg: I have a vm tagged with `env: prod` `database: mysql` the total cost for this vm is $10 now the actual billing has this data like below Column 1(lables): `[{"key": "env", "value":"prod"},{"key":"database","value":mysql"}]` Column 2 (cost): 10 but if do unnest ```select la.key,la.value ,cost from my_tbl , UNNEST(labels) AS la group by la.key,la.value, cost``` – TheDataGuy Nov 19 '18 at 18:10
  • it is still not 100% clear about back and forth with arrays but hope you have good reason for it - meantime see update in answer – Mikhail Berlyant Nov 19 '18 at 18:26