More efficient way to SELECT rows from PARTITION BY

Question

Suppose I have the following table:

+----+-------------+-------------+
| id | step_number | employee_id |
+----+-------------+-------------+
|  1 |           1 |           3 |
|  1 |           2 |           3 |
|  1 |           3 |           4 |
|  2 |           2 |           3 |
|  2 |           3 |           4 |
|  2 |           4 |           5 |
+----+-------------+-------------+

My desired results are:

+----+-------------+-------------+
| id | step_number | employee_id |
+----+-------------+-------------+
|  1 |           1 |           3 |
|  2 |           2 |           3 |
+----+-------------+-------------+

My current solution is:

SELECT
    *
FROM
(SELECT
    id,
    step_number,
    MIN(step_number) OVER (PARTITION BY id) AS min_step_number,
    employee_id
FROM
    table_name) AS t
WHERE
    t.step_number = t.min_step_number

Is there a more efficient way I could be doing this?

I'm currently using postgresql, version 12.

score 4 · Accepted Answer · answered Apr 15 '20 at 22:20

In Postgres, I would recommend using distinct on to adress this greatest-n-per-group problem:

select distinct on (id) t.*
from mytbale t
order by id, step_number

This Postgres extension to the SQL standard has usually better performance than the standard approach using window functions (and, as a bonus, the syntax is neater).

Note that this assumes unicity of (id, step_number) tuples: otherwise, the results might be different than those of your query (which allows ties, while distinct on does not).

More efficient way to SELECT rows from PARTITION BY

1 Answers1