16

I have two arrays in PostgreSQL that I need to union. For example:

{1,2,3} union {1,4,5} would return {1,2,3,4,5}

Using the concatenate (||) operator would not remove duplicate entries, i.e. it returns {1,2,3,1,4,5}

I found one solution on the web, however I do not like how it needs to unnest both arrays: select ARRAY(select unnest(ARRAY[1,2,3]) as a UNION select unnest(ARRAY[2,3,4,5]) as a)

Is there an operator or built-in function that will cleanly union two arrays?

vektor
  • 3,312
  • 8
  • 41
  • 71
MrGlass
  • 9,094
  • 17
  • 64
  • 89
  • I don't think there are any set-wise operators or functions for arrays, there's a similar question [about intersections over here](http://stackoverflow.com/q/7020264/479863), you could probably adapt the solution. – mu is too short Jun 10 '14 at 18:58
  • @muistooshort There are a lot of set operator for arrays http://www.postgresql.org/docs/9.1/static/functions-array.html – MrGlass Jun 10 '14 at 20:07
  • 2
    `@>` and `<@` don't really qualify as *a lot*. Or am I missing something? – mu is too short Jun 10 '14 at 20:11

4 Answers4

24

If your problem is to unnest twice this will unnest only once

select array_agg(a order by a)
from (
    select distinct unnest(array[1,2,3] || array[2,3,4,5]) as a
) s;
Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
  • 2
    as a function ```CREATE OR REPLACE FUNCTION mergeArrays (a1 ANYARRAY, a2 ANYARRAY) RETURNS ANYARRAY AS $$ SELECT ARRAY_AGG(x ORDER BY x) FROM ( SELECT DISTINCT UNNEST($1 || $2) AS x ) s; $$ LANGUAGE SQL STRICT;``` – Ryan Guill Dec 21 '16 at 16:40
13

There is a extension intarray (in contrib package) that contains some useful functions and operators:

postgres=# create extension intarray ;
CREATE EXTENSION

with single pipe operator:

postgres=# select array[1,2,3] | array[3,4,5];
  ?column?   
─────────────
 {1,2,3,4,5}
(1 row)

or with uniq function:

postgres=# select uniq(ARRAY[1,2,3] || ARRAY[3,4,5]);
    uniq     
─────────────
 {1,2,3,4,5}
(1 row)

ANSI/SQL knows a multiset, but it is not supported by PostgreSQL yet.

Pavel Stehule
  • 42,331
  • 5
  • 91
  • 94
  • 1
    Thanks, but my real world case isnt actually with integers, so intarray wont help me – MrGlass Jun 10 '14 at 20:09
  • @MrGlass then you need to write own extension. but, I don't think, so unnest has too high overhead. Merge 100K values needs about 123ms on my Lenovo T520. Any extension should not be significantly faster. UNNEST, UNION ALL are fast. – Pavel Stehule Jun 11 '14 at 07:13
  • Yeah, the answer might be that this is the best method. I was just hoping I missed something :) – MrGlass Jun 11 '14 at 14:28
3

The intarray-based answers don't work when you're trying to take the set union of an array-valued column from a group of rows. The accepted array_agg-based answer can be modified to work, e.g.

SELECT selector_column, array_agg(a ORDER BY a) AS array_valued_column
FROM (
    SELECT DISTINCT selector_column, UNNEST(array_valued_column) AS a FROM table
) _ GROUP BY selector_column;

but, if this is buried deep in a complex query, the planner won't be able to push outer WHERE expressions past it, even when they would substantially reduce the number of rows that have to be processed. The right solution in that case is to define a custom aggregate:

CREATE FUNCTION array_union_step (s ANYARRAY, n ANYARRAY) RETURNS ANYARRAY
   AS $$ SELECT s || n; $$
   LANGUAGE SQL IMMUTABLE LEAKPROOF PARALLEL SAFE;

CREATE FUNCTION array_union_final (s ANYARRAY) RETURNS ANYARRAY
  AS $$
    SELECT array_agg(i ORDER BY i) FROM (
      SELECT DISTINCT UNNEST(x) AS i FROM (VALUES(s)) AS v(x)
    ) AS w WHERE i IS NOT NULL;
  $$
  LANGUAGE SQL IMMUTABLE LEAKPROOF PARALLEL SAFE;

CREATE AGGREGATE array_union (ANYARRAY) (
  SFUNC = array_union_step,
  STYPE = ANYARRAY,
  FINALFUNC = array_union_final,
  INITCOND = '{}',
  PARALLEL = SAFE
);

Usage is

SELECT selector_column, array_union(array_valued_column) AS array_valued_column
  FROM table
  GROUP BY selector_column;

It's doing the same thing "under the hood", but because it's packaged into an aggregate function, the planner can see through it.

It's possible that this could be made more efficient by having the step function do the UNNEST and append the rows to a temporary table, rather than a scratch array, but I don't know how to do that and this is good enough for my use case.

zwol
  • 135,547
  • 38
  • 252
  • 361
2

Can be done like so...

select uniq(sort(array_remove(array_cat(ARRAY[1,2,3], ARRAY[1,4,5]), NULL)))

gives:

{1,2,3,4,5}

array_remove is needed because your can't sort arrays with NULLS. Sort is needed because uniq de-duplicates only if adjacent elements are found.

A benefit of this approach over @Clodoaldo Neto's is that works entire within the select, and doesn't the unnest in the FROM clause. This makes it straightforward to operate on multiple arrays columns at the same time, and in a single table-scan. (Although see Ryan Guill version as a function in the comment).

Also, this pattern works for all array types (who's elements are sortable).

A downside is that, feasibly, its a little slower for longer arrays (due to the sort and the 3 intermediate array allocations).

I think both this and the accept answer fail if you want to keep NULL in the result.

user48956
  • 14,850
  • 19
  • 93
  • 154
  • 1
    Note that `sort` and `uniq` are both from the `intarray` extension and are only documented to work with arrays of int. See https://www.postgresql.org/docs/12/intarray.html . – zwol Jul 31 '20 at 02:30