-1

(Simplified example). I have this table:

USER ID |  code   |   sales

 1          x          100
 1          y          200
 1          z          150

 2          x          300
 2          z          17
 2          y          100

For each user, I need to show just the record with the max sales.

USER ID |  code   |   sales

 1          y          200
 2          x          300

In the real table I have, It is millions of records. What would be the most efficient way of doing this?

Right now I find for each user the max sales record and join it unto itself to find the full record.

SELECT * from T
WHERE sales = (SELECT MAX(sales) FROM T WHERE user_id=outr.user_id)

Let's assume that sales number does not repeat itself for a specific user.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
Itay Moav -Malimovka
  • 52,579
  • 61
  • 190
  • 278
  • Honest curious about efficiency argumentation: it may help me and maybe other people what to look for in detail when it comes to finding the most efficient solution, could you provide comments? – lemon May 15 '22 at 13:25
  • @lemon can u give an example? I tried to abstract the problem here, so it is not specific to a certain domain – Itay Moav -Malimovka May 16 '22 at 20:21
  • Actually my question was for the database experts of this platform, like Bill or people who are most probably in the field longer than me, and that have more words when it comes to efficiency discussions. – lemon May 16 '22 at 20:37

2 Answers2

1

With Postgres the most efficient way is typically using distinct on()

select distinct on (user_id) *
from the_table
order by user_id, sales desc;
0

Refactor your dependent subquery into an independent subquery and then JOIN it.

SELECT T.* 
  FROM T
  JOIN (
          SELECT MAX(sales) max_sales, user_id
            FROM T
           GROUP BY user_id
       ) M   ON T.user_id = M.user_id
            AND T.sales = M.max_sales;

The subquery gets the largest sale for each user it. The JOIN operation's ON clause retrieves the detail record for each user's largest sale.

A multicolumn index on (user_id, sales) will make this query work efficiently when running on a large table.

This works on both and .

O. Jones
  • 103,626
  • 17
  • 118
  • 172