1

Background: I have a BigQuery schema with several REPEATED levels and trying create UDF functions to facilitate querying.

Able to handle 1 level: when I'm dealing with only 1 REPEATED level I'm able to write the below UDF function which works:

enter image description here

CREATE TEMP FUNCTION struct_array_struct_scalar(data ANY TYPE) AS (
  (
    SELECT ARRAY_AGG(l1.element.id)
    FROM UNNEST(data.list) as l1
  )
);
SELECT struct_array_struct_scalar(my_field) as my_field_ids FROM my_table

The above works fine and returns an ARRAY of ids.

Struggling with multiple levels: now when I have multiple nested REPEATED fields this is where the challenge arises

enter image description here

Here the values I'm interested in are nested under my_field.list[].element.rates.list[].element.rate.double. There are only 2 REPEATED levels (list[]), everything else is nested under records so using the . notation to get them is trivial.

So far I have:

CREATE TEMP FUNCTION foo(data ANY TYPE) AS (
  (
    SELECT ARRAY_AGG(l2.element.rate.double)
    FROM UNNEST(
      (SELECT l1.element.rates.list
      FROM UNNEST(data.list) AS l1)
    ) AS l2
  )
);

So I have:

  • 2 nested subqueries with UNNEST to handle the 2 REPEATED fields with respective aliases l1 and l2
  • . notation to go access nested records within those lists

Although the syntax seems good I get the following error:

Scalar subquery produced more than one element

Q: Can you please give me a clue as to what I need to change to return an ARRAY of rate.double?

Max
  • 12,794
  • 30
  • 90
  • 142

1 Answers1

1

Hope below give you some direction for your problem. (It might need to be modified depending on your real data)

CREATE TEMP FUNCTION foo(data ANY TYPE) AS (
  ARRAY(
    SELECT l2.element.rate.double 
      FROM UNNEST(data.list) l1, UNNEST(l1.element.rates.list) l2
     WHERE l2.element.rate.double IS NOT NULL
  )
);

-- another udf which returns same result as above.
CREATE TEMP FUNCTION foo0(data ANY TYPE) AS ((
  SELECT ARRAY_AGG(l2.element.rate.double IGNORE NULLS)
    FROM UNNEST(data.list) l1, UNNEST(l1.element.rates.list) l2
));

SELECT foo(my_field) FROM my_table;

Query results

enter image description here

Sample Data

enter image description here

Jaytiger
  • 11,626
  • 2
  • 5
  • 15
  • 1
    Great stuff. FYI my data had NULL values and BigQuery would throw `Array cannot have a null element` error but adding `WHERE l2.element.rate.double IS NOT NULL` condition to the select fixed the issue – Max Mar 29 '23 at 11:06
  • 1
    @Max, sure you can discard `null` in `WHERE` clause and I've added another implementation which returns same result fyi. kindly check that out as well. thanks. – Jaytiger Mar 29 '23 at 11:52