In BigQuery how to concatenate arrays with distinct values in nested aggregations?

Question

I have a BigQuery SQL below. The source table is aggregated from another table and field x contains distinct values. To further aggregate on field name, the query unnests x and uses array_agg(distinct x) to merge the x arrays. It works, but it takes two steps to aggregate x and y separately. How to obtain the same results in a single step of aggregation on name?

with source AS (
  SELECT [1, 2, 3, 4] AS x, 10 AS y, 'john' AS name
  UNION ALL 
  SELECT [3, 4, 5, 6], 3.3, 'john'
  UNION ALL 
  SELECT [7, 8, 9], 12.7, 'jacob'
  ),

agg_x AS (
  select name, array_agg(distinct x) AS distinct_x_array
  from source, unnest(x) x
  group by name
),

agg_y AS (
select name, sum(y) sum_y
from source
group by name
)

select name, distinct_x_array, sum_y
FROM agg_x join agg_y using (name)

If your requirement is to use single query instead of creating to two cte table you can consider following query:`select name, distinct_x_array, sum_y FROM (select name, array_agg(distinct x) AS distinct_x_array from source, unnest(x) x group by name) join (select name, sum(y) sum_y from source group by name) using (name)`Let me know if this satisfies your requirement or not? — kiran mathew, Jun 28 '23 at 05:30

score 2 · Answer 1 · answered Jun 29 '23 at 18:12

Consider below approach (no joins involved)

select name, sum_y,
  array(select distinct x from t.x_array x) as distinct_x_array
from (
  select name, array_concat_agg(x) x_array, sum(y) sum_y
  from source group by name 
) t

if applied to sample data in your question - output is

score 2 · Accepted Answer · answered Jun 29 '23 at 18:22

2

Even better option (pure "single step")

select name, array_agg(distinct x) distinct_x_array, sum(if(offset=0,y,0)) sum_y
from source, unnest(x) x with offset
group by name

with output

answered Jun 29 '23 at 18:22

Mikhail Berlyant

165,386
8
154
230

In BigQuery how to concatenate arrays with distinct values in nested aggregations?

2 Answers2