3

I am working on a MERGE process and update an array field with new data but only if the value isn't already found in the array.

target table
+-----+----------+
| id  |  arr_col |
+-----+----------+
| a   |  [1,2,3] |
| b   |    [0]   |
+-----+----------+

source table
+-----+----------+
| id  |  arr_col |
+-----+----------+
| a   | [3,4,5] |
| b   |  [0,0]   |
+-----+----------+

target table post-merge
+-----+-------------+
| id  |  arr_col    |
+-----+-------------+
| a   | [1,2,3,4,5] |
| b   |    [0]      |
+-----+-------------+

I was trying to use SQL on this answer in my MERGE statement

merge into target t
using source
  on target.id = source.id
when matched then
update set target.arr_col = array(
                             select distinct x 
                             from unnest(array_concat(target.arr_col, source.arr_col)) x
                            ) 

but BigQuery shows me the following error: Correlated Subquery is unsupported in UPDATE clause.

Is there any other way to update this array field via MERGE? The target and source tables can be quite large and would run daily. So it's a process I would like to have incremental updates for as opposed to recreating entire table with new data every time.

Korean_Of_the_Mountain
  • 1,428
  • 3
  • 16
  • 40

2 Answers2

1

Below is for BigQuery Standard SQL

merge into target
using (
  select id, 
    array(
      select distinct x 
      from unnest(source.arr_col || target.arr_col) as x
      order by x
    ) as arr_col
  from source 
  join target
  using(id)
) source
  on target.id = source.id
when matched then
update set target.arr_col = source.arr_col;
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
1

Wanted to expand on Mikhail Berlyant's answer because my actual application differed a little bit from OP as I also needed to data to be inserted if merge conditions were not met.

merge into target
using (
  select id, 
    array(
      select distinct x 
      from unnest(
          /*  
          concat didn't work without case-when statement for 
          new data (i.e. target.id is null) 
          */
          case when target.id is not null then source.arr_col || target.arr_col 
          else source.arr_col
          end
      ) as x
      order by x
    ) as arr_col
  from source 
  left join target /* to be able to account for brand new data in source */
  using(id)
) source
  on target.id = source.id
when matched then
update set target.arr_col = source.arr_col
when not matched insert row

;

Korean_Of_the_Mountain
  • 1,428
  • 3
  • 16
  • 40