Background: I have a BigQuery schema
with several REPEATED
levels and trying create UDF functions
to facilitate querying.
Able to handle 1 level: when I'm dealing with only 1 REPEATED level I'm able to write the below UDF function which works:
CREATE TEMP FUNCTION struct_array_struct_scalar(data ANY TYPE) AS (
(
SELECT ARRAY_AGG(l1.element.id)
FROM UNNEST(data.list) as l1
)
);
SELECT struct_array_struct_scalar(my_field) as my_field_ids FROM my_table
The above works fine and returns an ARRAY
of ids.
Struggling with multiple levels: now when I have multiple nested REPEATED
fields this is where the challenge arises
Here the values I'm interested in are nested under my_field.list[].element.rates.list[].element.rate.double
. There are only 2 REPEATED
levels (list[]), everything else is nested under records so using the .
notation to get them is trivial.
So far I have:
CREATE TEMP FUNCTION foo(data ANY TYPE) AS (
(
SELECT ARRAY_AGG(l2.element.rate.double)
FROM UNNEST(
(SELECT l1.element.rates.list
FROM UNNEST(data.list) AS l1)
) AS l2
)
);
So I have:
- 2 nested subqueries with
UNNEST
to handle the 2REPEATED
fields with respective aliasesl1
andl2
.
notation to go access nested records within those lists
Although the syntax seems good I get the following error:
Scalar subquery produced more than one element
Q: Can you please give me a clue as to what I need to change to return an ARRAY of rate.double?