3

We use QuickSight for visualizing cost data. GCP exports a lot of their billing data fields as arrays, but AWS QuickSight can't import arrays currently. So, I am trying to create an Athena view by pulling certain values out of the arrays and putting them into their own column.

An example is the project field, which has arrays within struct:

struct<id:string,number:string,name:string,labels:array<struct<key:string,value:string>>,ancestry_numbers:string,ancestors:array<struct<resource_name:string,display_name:string>>>

The labels array within the above project struct has a key: environment with different values. I want to pull that data into a new column so that I can filter all rows by environment values.

I can't figure out how to create a new column based on a specific array value. The following query returns an error:

SELECT *
FROM "database"."table" 
CROSS JOIN UNNEST(project.labels) AS t(labels)
where contains(labels.key, 'environment')
limit 10
Error: AMBIGUOUS_NAME: line 4:16: Column 'labels' is ambiguous

EDIT:

I am able to pull the labels array into a separate column with the following query:

SELECT "billing_account_id", labels
FROM "database"."table" 
LEFT JOIN UNNEST(project.labels) as labels
ON TRUE
limit 10

This gives me the following table:

billing_account_id labels
1234567890 [{key=environment, value=prod}]
1234567891 [{key=environment, value=dev}]

However, this still doesn't fully solve my question as I want to create a separate column that looks like this:

billing_account_id environment
1234567890 prod
1234567891 dev

Is there a way to do this all within the same query, or do I need to create an Athena view from the first query & then run a second UNNEST query to create a view that will give me the second table?

Any help is appreciated, thank you!

Vera
  • 31
  • 4

1 Answers1

0

As the docs say unnest will expand the array into single column, which you have named labels which matches the column name on the left side of the join (i.e. one column from the source - "database"."table"). You can rename the expanded column, for example into label (which seems more appropriate name):

SELECT *
FROM "database"."table" 
CROSS JOIN UNNEST(project.labels) AS t(label)
where strpos(label.key, 'environment') > 0
limit 10

Or use aliases:

SELECT id,
   number,
   -- s.labels if needed
   -- rest of columns from the source
   t.labels.key key,
   t.labels.value value
FROM "database"."table" s
CROSS JOIN UNNEST(project.labels) AS t(labels)
where strpos(t.labels.key, 'environment') > 0
limit 10

P.S.

Use strpos, contains is array function in Presto/Trino.

Guru Stron
  • 102,774
  • 10
  • 95
  • 132
  • Hi, I tried the top option and am now getting this error: `FUNCTION_NOT_FOUND: line 4:7: Unexpected parameters (varchar, varchar(11)) for function contains. Expected: contains(varchar, ipaddress), contains(array(t), t) T:comparable` There is no "ipaddress" in the table so I'm not sure where that is even coming from. – Vera Jul 18 '23 at 01:05
  • @Vera yes, that code was suspicious too. Use `strpos`. See the update. – Guru Stron Jul 18 '23 at 06:40