Postgres GROUP BY integer arrays

Question

My DB table in Postgres 9.3

| f1|f2 | f3
| 1 | 2 | {1,2,3}
| 1 | 3 | {4,5}

f1,f2 are integers f3 is integer[].

How can I get this:

SELECT f1, myfn(f3)
FROM dbTable
GROUP BY f1;

so that I get:

| 1 | {1,2,3,4,5}

Is there a similar function in Postgres like array_agg that concatenates arrays and creates a new array;

score 7 · Accepted Answer · edited Aug 05 '14 at 21:16

7

SQL Fiddle

First unnest then aggregate

select f1, array_agg(f3) as f3
from (
    select f1, unnest(f3) as f3
    from dbtable
) s
group by f1

To avoid duplicates and for a sorted output (intarry extension must be installed):

select f1, sort(uniq(array_agg(f3))) as f3
from (
    select f1, unnest(f3) as f3
    from dbtable
) s
group by f1

edited Aug 05 '14 at 21:16

Alexandros

2,160
4
27
52

answered Aug 05 '14 at 21:04

Clodoaldo Neto

118,695
26
233
260

+1 That is what I thought too after looking at http://stackoverflow.com/questions/7020264/intersection-of-multiple-arrays-in-postgresql. I was hoping for something more efficient (my dbTable is not that simple) – Alexandros Aug 05 '14 at 21:13
@Alexandros It is possible to create your own aggregate function and it would simplify the code but I don't know if it would be more performant. Is code simplicity what you are after? – Clodoaldo Neto Aug 05 '14 at 21:19
Simplicity is not that much of an issue (unless we are talking about a 1000 lines function). I care about performance more but your solution works pretty much OK. If you do not mind I will wait a couple of hours before accepting your answer in case someone proposes something amazing (which I doubt). Hope that is OK – Alexandros Aug 05 '14 at 21:23
Although it seems strange, this solutions seems to perform better (and is more flexible since you can do additional actions on the unnested array elements) than the aggregate function (which is simpler to implement though). So, both are excellent answers. – Alexandros Aug 05 '14 at 22:08

IMSoP · Answer 2 · 2014-08-05T21:45:17.087

4

I don't think there's a built-in aggregate function for that, but you could easily create your own.

Looking at the array functions and operators available, the function you want is array_cat - the operator || is usually more convenient, but we need a function to create the aggregate, so would otherwise have to wrap it in one.

Since we already have a function to use, we can create a very simple aggregate based on it - start with an empty array, concatenate each new item in turn, and no special intermediate data or final steps are necessary.

The CREATE AGGREGATE statement would be as follows (note the use of the anyarray pseudo-type as both the input and output of the aggregate):

CREATE AGGREGATE array_cat_agg(anyarray) (
    SFUNC=array_cat,
    STYPE=anyarray,
    INITCOND='{}'
);

This should then allow you to write exactly what you wanted:

SELECT f1, array_cat_agg(f3)
FROM dbTable
GROUP BY f1;

Here's a demo of this in action on SQLFiddle.

edited Aug 05 '14 at 21:45

answered Aug 05 '14 at 21:36

IMSoP

89,526
13
117
169

+1 your excellent method works as well. I will test performance for both answers and will get back. One thought though. If i am interested only in integer arrays, will setting this on function definition, improve performance? What do you think. Thanks!!! – Alexandros Aug 05 '14 at 21:49
@Alexandros The actual types of the aggregate and function are determined when the statement is parsed, not when it is run, so I doubt there would be any performance benefit in declaring the types as `int[]` instead of `anyarray`. i.e. in our example the parser "knows" that `f3` is of type `int[]`, so resolves the `anyarray` markers in `array_cat_agg`'s argument and state type to mean `int[]`, and the same for the argument and return types of `array_cat` itself. – IMSoP Aug 05 '14 at 21:54

Postgres GROUP BY integer arrays

2 Answers2