Aggregate functions over arrays

Question

I have a table like this:

+-----+----------------+
| ID  |  array300      |
+-----+----------------+
| 100 | {110,25,53,..} |
| 101 | {56,75,59,...} |
| 102 | {65,93,82,...} |
| 103 | {75,70,80,...} |
+-----+----------------+

array300 column is an array of 300 elements. I need to have arrays of 100 elements with every element representing the average of 3 elements of array300. For this example the answer will be like:
array100
{62.66,...}
{63.33,...}
{80,...}
{78.33,...}

See my update to the answer. – Ihor Romanchenko Dec 12 '12 at 21:41 — Ihor Romanchenko, Dec 12 '12 at 21:41

Ihor Romanchenko · Accepted Answer · 2012-12-12T21:41:26.887

Try something like this:

SELECT id, unnest(array300) as val, ntile(100) OVER (PARTITION BY id) as bucket_num
FROM your_table

This SELECT will give you 300 records per array300 with same id and assing them the bucket_num (1 for firs 3 elements, 2 for next 3, and so on).

Then use this select to get the avg of elements in the bucket:

SELECT id, avg(val) as avg_val
FROM (...previous select here...)
GROUP BY id, bucket_num

Next - just aggregate the avg_val into array:

SELECT id, array_agg(avg_val) as array100
FROM (...previous select here...)
GROUP BY id

Details: unnest , ntile , array_agg , OVER (PARTITION BY )

UPD: Try this function:

CREATE OR REPLACE FUNCTION public.array300_to_100 (
  p_array300 numeric []
)
RETURNS numeric [] AS
$body$
DECLARE
  dim_start int = array_length(p_array300, 1); --size of input array
  dim_end int = 100; -- size of output array
  dim_step int = dim_start / dim_end; --avg batch size
  tmp_sum NUMERIC; --sum of the batch
  result_array NUMERIC[100]; -- resulting array
BEGIN

  FOR i IN 1..dim_end LOOP --from 1 to 100.
    tmp_sum = 0;

    FOR j IN (1+(i-1)*dim_step)..i*dim_step LOOP --from 1 to 3, 4 to 6, ...
      tmp_sum = tmp_sum + p_array300[j];  
    END LOOP; 

    result_array[i] = tmp_sum / dim_step;
  END LOOP; 

  RETURN result_array;
END;
$body$
LANGUAGE 'plpgsql'
IMMUTABLE
RETURNS NULL ON NULL INPUT;

It takes one array300 and outputs one array100. To use it:

SELECT id, array300_to_100(array300)
FROM table1;

If you have any problems understanding it - just ask me.

Nice application of `ntile()`. – Erwin Brandstetter Dec 10 '12 at 17:14 — Erwin Brandstetter, Dec 10 '12 at 17:14

A.H. · Answer 2 · 2012-12-12T20:11:11.997

Putting the pieces of Igor into another form:

 select id, array300, (
    select array_agg(z) from
    (
        select avg(x) from 
        (
            select x, ntile(array_length(array300,1)/3) over() from unnest(array300) x
        ) y 
        group by ntile
    ) z
) array100
from your_table

For a small example table like this

 id |       array300        
----+-----------------------
  1 | {110,25,53,110,25,53}
  2 | {56,75,59,110,25,53}
  3 | {65,93,82,110,25,53}
  4 | {75,70,80,110,25,53}

the result is:

 id |       array300        |                   array100                    
----+-----------------------+-----------------------------------------------
  1 | {110,25,53,110,25,53} | {(62.6666666666666667),(62.6666666666666667)}
  2 | {56,75,59,110,25,53}  | {(63.3333333333333333),(62.6666666666666667)}
  3 | {65,93,82,110,25,53}  | {(80.0000000000000000),(62.6666666666666667)}
  4 | {75,70,80,110,25,53}  | {(75.0000000000000000),(62.6666666666666667)}
(4 rows)

Edit My first version used a fixes ntile(2). This only worked for source arrays of size 6. I've fixed that by using array_length(array300,1)/3 instead.

score 1 · Answer 3 · answered Dec 18 '12 at 13:47

I'm not able to answer your question completely, however I have found aggregation function for summing integer arrays. Perhaps someone (or you) can modify it to avg.

Source: http://archives.postgresql.org/pgsql-sql/2005-04/msg00402.php

CREATE OR REPLACE FUNCTION array_add(int[],int[]) RETURNS int[] AS '
  DECLARE
    x ALIAS FOR $1;
    y ALIAS FOR $2;
    a int;
    b int;
    i int;
    res int[];
  BEGIN
    res = x;

    a := array_lower (y, 1);
    b := array_upper (y, 1);

    IF a IS NOT NULL THEN
      FOR i IN a .. b LOOP
        res[i] := coalesce(res[i],0) + y[i];
      END LOOP;
    END IF;

    RETURN res;
  END;
'
LANGUAGE plpgsql STRICT IMMUTABLE;

--- then this aggregate lets me sum integer arrays...

CREATE AGGREGATE sum_integer_array (
    sfunc = array_add,
    basetype = INTEGER[],
    stype = INTEGER[],
    initcond = '{}'
);


Here's how my sample table looked  and my new array summing aggregate
and function:

#SELECT * FROM arraytest ;
 id | somearr
----+---------
 a  | {1,2,3}
 b  | {0,1,2}
(2 rows)

#SELECT sum_integer_array(somearr) FROM arraytest ;
 sum_integer_array
-------------------
 {1,3,5}
(1 row)

Tomas Greif · Answer 4 · 2012-12-18T15:06:58.970

Is this any faster?

Edit: This is more elegant:

with  t as (select generate_series(1, 100,1) a , generate_series(101,200,1) b , generate_series(201,300,1) c)

    select 
        id,
        array_agg((array300[a] + array300[b] + array300[c]) / 3::numeric order by a)  as avg
    from 
        t,
        tmp.test2
    group by 
        id

End of edit

Edit2 This is the shortest select I can think of:

select 
    id,
    array_agg((array300[a] + array300[a+100] + array300[a+200]) / 3::numeric order by a)  as avg
from 
    (select generate_series(1, 100,1) a) t,
    tmp.test2
group by 
    id

End of edit2

with 

t as (select generate_series(1, 100,1) a , generate_series(101,200,1) b , generate_series(201,300,1) c)

,u as (
    select 
        id,
        a,
        (array300[a] + array300[b] + array300[c]) / 3::numeric as avg
    from 
        t,
        tmp.test2 /* table with arrays - id, array300 */
    order by 
        id,
        a
 )

select 
    id, 
    array_agg(avg)
from 
    u 
group by 
    id

Aggregate functions over arrays

4 Answers4

Linked