1

I would like to query BigQuery where two columns are returned, a distinct object ID column, and a distinct combination of array values based on values of the arrays where the object IDs were the same.

For example, say I have this table in BigQuery:

+-----------+-----------+
| object_id |  array    |
+-----------+-----------+
| a         | 1,2,3,4,5 |
| a         | 2,3,4,5,6 |
| b         | 9,8,7,6   |
+-----------+-----------+

I would like to run a query that returns the following data:

+-----------+-------------+
| object_id |  array      |
+-----------+-------------+
| a         | 1,2,3,4,5,6 |
| b         | 9,8,7,6     |
+-----------+-------------+

I think the query should be something similar to this but am a little unclear on how to get the distinct object ID with the distinct combination of array elements.

walshbm15
  • 113
  • 1
  • 11

2 Answers2

1

Consider below [slim] solution

select id, array_agg(distinct el) as arr
from `project.dataset.table`,
unnest(arr) el
group by id  
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
0

Use array_concat_agg:

WITH `project.dataset.table` AS (
  SELECT 'a' id, [1,2,3,4,5] arr UNION ALL
  SELECT 'a', [2,3,4,5,6] UNION ALL
  SELECT 'b', [9,8,7,6]
)
SELECT
    id,
    ARRAY(SELECT DISTINCT x FROM UNNEST(concatenated) as x) as a
FROM (
  SELECT
    id,
    ARRAY_CONCAT_AGG(arr) as concatenated
  FROM `project.dataset.table`
  GROUP BY id
)

enter image description here

Sergey Geron
  • 9,098
  • 2
  • 22
  • 29