2

I am trying to get the last element of an ordered set, stored in a database table. The ordering is defined by one of the columns in the table. Also the table contains multiple sets, so I want the last one for each of the sets.

As an example consider the following table:

benchmarks=# select id,sorter from aggtest ;
 id | sorter 
----+--------
  1 |      1
  3 |      1
  5 |      1
  2 |      2
  7 |      2
  4 |      1
  6 |      2
(7 rows)

Sorter 1 and 2 define each of the sets, sets are ordered by the id column. To get the last element of each set, I defined an aggregate function:

CREATE FUNCTION public.last_agg ( anyelement, anyelement )
RETURNS anyelement LANGUAGE sql IMMUTABLE STRICT AS $$
        SELECT $2;
$$;

CREATE AGGREGATE public.last (
        sfunc    = public.last_agg,
        basetype = anyelement,
        stype    = anyelement
);

As explained here.

However when I use this I get:

benchmarks=# select last(id),sorter from aggtest group by sorter order by sorter;
 last | sorter 
------+--------
    4 |      1
    6 |      2
(2 rows)

However, I want to get (5,1) and (7,2) as these are the last ids (numerically) in the set. Looking at how the aggregate mechanism works, I can see quite well, why the result is not what I want. The items are returned in the order I added them, and then aggregated so that the last one I added is returned.

I tried sorting by ids, so that each group is sorted independently, however that does not work:

benchmarks=# select last(id),sorter from aggtest group by sorter order by sorter,id;
ERROR:  column "aggtest.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: ...(id),sorter from aggtest group by sorter order by sorter,id;

If I wrap the sorting criteria in another aggregate, I get wrong data again:

benchmarks=# select last(id),sorter from aggtest group by sorter order by sorter,last(id);
 last | sorter 
------+--------
    4 |      1
    6 |      2
(2 rows)

Also grouping by id in addition to sorter does not work obviously.

Of course there is an easier way, to get the last (highest) id for each group by using the max aggregate. However, I am not so much interested in the id but as in data associated with it (i.e. in the same row). Hence I do not to sort by id and then aggregate so that the row with the highest id is returned for each group.

What is the best way to accomplish this?

EDIT: Why does max(id) grouped by sorter not work

Assume the following complete table (unsorter represents the additional data I have in the table):

benchmarks=# select * from aggtest ;
 id | sorter | unsorter 
----+--------+----------
  1 |      1 |        1
  3 |      1 |        2
  5 |      1 |        3
  2 |      2 |        4
  7 |      2 |        5
  4 |      1 |        6
  6 |      2 |        7
(7 rows)

I would like to retrieve the lines:

 id | sorter | unsorter 
----+--------+----------
  5 |      1 |        3
  7 |      2 |        5

However with max(id) and grouping by sorter I get:

benchmarks=# select max(id),sorter,unsorter from aggtest group by sorter;
ERROR:  column "aggtest.unsorter" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: select max(id),sorter,unsorter from aggtest group by sorter;

Using a max(unsorter) obviously does not work either:

benchmarks=# select max(id),sorter,max(unsorter) from aggtest group by sorter;
 max | sorter | max 
-----+--------+-----
   5 |      1 |   6
   7 |      2 |   7
(2 rows)

However using distinct (the accepted answer) I get:

benchmarks=# select distinct on (sorter) id,sorter,unsorter from aggtest order by sorter, id desc;
 id | sorter | unsorter 
----+--------+----------
  5 |      1 |        3
  7 |      2 |        5
(2 rows)

Which has the correct additional data. The join approach also seems to work, by is slightly slower on the test data.

LiKao
  • 10,408
  • 6
  • 53
  • 91
  • 2
    SQL tables represent *unordered* sets. There is no concept of "first" and "last" unless a column explicitly provides the ordering. – Gordon Linoff Mar 13 '15 at 12:33
  • @GordonLinoff yes, sure. Maybe the question wasn't clear in that regard. Ordering should be provided by the id column in the example above. However the aggregate function does not use that ordering, but rather the order in which the rows are returned (which may be unpredictable in some cases AFAIK). – LiKao Mar 13 '15 at 12:41
  • 3
    Can you elaborate on why using `max(id)` grouped by sorter wouldn't work? – jpw Mar 13 '15 at 12:52
  • @jpw see edit above. – LiKao Mar 13 '15 at 13:34

3 Answers3

2

Why not use a window function:

select id, sorter
from (
   select id, sorter, 
          row_number() over (partition by sorter order by id desc) as rn
   from aggtest
) t
where rn = 1;

Or using Postgres distinct on operator which is usually faster:

select distinct on (sorter) id, sorter
from aggtest
order by sorter, id desc
1

You write:

Of course there is an easier way, to get the last (highest) id for each group by using the max aggregate. However, I am not so much interested in the id but as in data associated with it (i.e. in the same row).

This query will give you the data associated with the highest id of each sorter group.

select a.* from aggtest a
join (
    select max(id) max_id, sorter 
    from aggtest
    group by sorter
) b on a.id = b.max_id and a.sorter = b.sorter
jpw
  • 44,361
  • 6
  • 66
  • 86
0

select distinct max(id) over (partition by sorter) id,sorter from aggtest order by 2 asc

returns: 5;1 7;2

Vao Tsun
  • 47,234
  • 13
  • 100
  • 132
  • Does not work as expected, unfortunately. If I add the additional data, I get the correct id and set number (sorter), but repeated each time for each of the values in the additional column. – LiKao Mar 13 '15 at 13:16