6

I have a table like this in PostgreSQL. I want to perform aggregation functions like mean and max for every 16 records based on ID (which is primary key). For example I have to calculate mean value for first 16 records and second 16 records and so on.

+-----+-------------
| ID  |  rainfall  |
+-----+----------- |
|  1  |  110.2     |
|  2  |  56.6      |
|  3  |  65.6      |
|  4  |  75.9      |
+-----+------------
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
f.ashouri
  • 5,409
  • 13
  • 44
  • 52

2 Answers2

12

The 1st approach that comes to mind is to use row_number() to annotate the table, then group by blocks of 16 rows.

SELECT min(id) as first_id, max(id) AS last_id, avg(rainfall) AS avg_this_16
FROM (
  SELECT id, rainfall, row_number() OVER (order by id) AS n
  FROM the_table
) x(id,rainfall,n)
GROUP BY n/16
ORDER BY n/16;

Note that this won't necessarily include 16 samples for the last group.

Alternately you can calculate a running average by using avg() as a window function:

SELECT id, avg(rainfall) OVER (ORDER BY id ROWS 15 PRECEDING)
FROM the_table;

... possibly annotating that with the row number and selecting the ones you want:

SELECT id AS greatest_id_in_group, avg_last_16_inclusive FROM (
  SELECT
    id, 
    avg(rainfall) OVER (ORDER BY id ROWS 15 PRECEDING) AS avg_last_16_inclusive,
    row_number() OVER (ORDER BY id) AS n
  FROM the_table
) x WHERE n % 16 = 0;

This will disregard the last n<16 samples, not returning a row for them.

Note that I'm assuming the IDs aren't guaranteed to be contiguous. If they are gap-less, you can just group by id/16 and avoid the window function.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • 1
    my records are more than 500 always. The IDs are consecutive without gaps but some of the rainfall values are null values. Your approach is quite working for me. Thank you – f.ashouri Oct 27 '12 at 10:32
  • I'm using your first approach by row_number(), but i'm not quite understand it. I need clarification in the following lines and explanation about why they are used please: 'row_number() OVER (order by id) AS n' and ') x(id,rainfall,n)' . and what is x in the second one? – f.ashouri Oct 27 '12 at 17:41
  • 1
    @user1043898: `x(id,rainfall,n)` is an alias for the derived table that the SELECT produces, you need to give it a name so that you can refer to it and you need to give it a name even if you don't refer to it by name because SQL says so; the things in the parentheses name the derived table's columns (not necessary in this case but a reasonable habit). This might help you with window functions: http://www.postgresql.org/docs/current/static/tutorial-window.html – mu is too short Oct 27 '12 at 18:12
  • I imagine the last line should be like this: `) x WHERE n % 16 = 0;` – f.ashouri Oct 27 '12 at 23:56
  • 1
    @user1043898 You're quite right, using % 1 results in the 1st record having only one sample, though it makes no difference from then on. – Craig Ringer Oct 28 '12 at 00:34
  • I made the `group by` using `(n-1)/4` instead of `n/4` since the number starts at 1. – Ludovic Kuty May 20 '21 at 13:55
0

late answer, but anyway for reference

since ID was said to be continuos and gap-less, then this would result pretty straightforward

SELECT avg(rainfall),string_agg(id::text, ',')
FROM the_table
GROUP BY (id - 1) / 16;

notice the (id - 1) to get the grouping from zero to 15, otherwise first group may dephase

PS: @Craig Ringer gave a hint by the end of his answer, but didn't post is as code

Note that I'm assuming the IDs aren't guaranteed to be contiguous. If they are gap-less, you can just group by id/16 and avoid the window function.

arhak
  • 2,488
  • 1
  • 24
  • 38