How to replace a DISTINCT ON with GROUP BY in PostgreSQL 9?

Question

I have been using the DISTINCT ON predicate and have decided to replace it with GROUP BY, mainly because it "is not part of the SQL standard and is sometimes considered bad style because of the potentially indeterminate nature of its results".

I am using DISTINCT ON in conjunction with ORDER BY in order to select the latest records in a history table, but it's not clear to me how to do the same with the GROUP BY. What could be a general approach in order to move from one construct to the other one?

An example could be

SELECT
  DISTINCT ON (f1, f2 ) *
  FROM table
  ORDER BY f1, f2, datefield DESC;

where I get the "latest" pairs of (f1,f2).

Chapter 7.3.3 discourages the use of DISTINCT ON. As soon as I'll found that email I'll quote it here about the planner and the "distinct on" optimizations. — EnzoR, Jan 29 '15 at 13:48
"*the potentially indeterminate nature of its results*" - only if you forget the `order by` or specify a sort order that is not stable (e.g. where duplicate values can occur in the sort column) - but that problem applies to the solution using a window function as well. — , Jan 30 '15 at 10:46

Gordon Linoff · Answer 1 · 2015-01-29T13:53:11.510

3

If you have a query like this:

select distinct on (col1) t.*
from table t
order by col1, col2

Then you would replace this with window functions, not a group by:

select t.*
from (select t.*,
             row_number() over (partition by col1 order by col2) as seqnum
      from table t
     ) t
where seqnum = 1;

edited Jan 29 '15 at 13:53

answered Jan 29 '15 at 13:44

Gordon Linoff

1,242,037
58
646
786

How to replace a DISTINCT ON with GROUP BY in PostgreSQL 9?

1 Answers1