2

I would like some help with the following join. I have one table (with about 20 million rows) that consists of:

MemberId (Primary Key) | Id (Primary Key) | TransactionDate | Balance

I would like to get the latest Balance for all the customers in one query. I know I could do something like this (I just wrote it from my memory). But this way is terribly slow.

SELECT * 
FROM money 
WHERE money.Id = (SELECT MAX(Id) 
                  FROM money AS m 
                  WHERE m.MemberId = money.MemberId)

Are there any other (faster/smarter) options?

rene
  • 41,474
  • 78
  • 114
  • 152
PKK
  • 127
  • 4
  • 15
  • To be sure, the primary key is composite, and the first column is MemberId? – Benoit Jan 03 '12 at 14:12
  • Umbrella has a good answer, but I would be really interested to find out how MySQL deals with the `IN` operator (replace the first `=`), if you don't mind. – Jonathan Dickinson Jan 03 '12 at 14:24
  • @Benoit Yes the primary key is composite and the first column is MemberId (It's a MyIsam table) – PKK Jan 03 '12 at 14:28
  • @JonathanDickinson It's slow with the IN also – PKK Jan 03 '12 at 14:37
  • Thanks, good to know. I saw there are some query optimizer improvements in the [6.0 experimental](http://forge.mysql.com/wiki/6.0_Subquery_Optimization_Benchmarks) - in theory (the theory being how MsSQL behaves :)) `IN` should be faster; so keep it mind for when MySQL catches up - because it's the 'right' way to be thinking in SQL terms (think in sets and not rows). SQL gets its bad performance name from people using it with a 'row' mindset. – Jonathan Dickinson Jan 03 '12 at 15:04

3 Answers3

5

In all optimization tutorials and screencasts that I've endured through, joins are always favoured over subqueries. When using a sub-query the sub-query is executed for each comparison, where as with a join only once.

SELECT * 
FROM money m
INNER JOIN (
    SELECT memberId, MAX(id) AS maxid
    FROM money
    GROUP BY memberId
) mmax ON mmax.maxid = m.id AND mmax.memberId = m.memberId
nikc.org
  • 16,462
  • 6
  • 50
  • 83
2

JOINing is not the best way to go about this. Consider using a GROUP BY clause to sift out the last transaction for each member, like this:

SELECT MemberId, MAX(Id), TransactionDate, Balance FROM money GROUP BY MemberId

UPDATE

as PKK pointed out, balance will be chosen randomly. It looks like you'll have to perform some sort of join after all. Consider this option:

SELECT MemberId, Id, TransactionDate, Balance FROM money WHERE Id IN (
    SELECT MAX(Id) FROM money GROUP BY MemberId
)
Umbrella
  • 4,733
  • 2
  • 22
  • 31
  • Edited to get the Max Id instead of Date – Umbrella Jan 03 '12 at 14:14
  • TransactionDate and Balance will be chosen randomly. Not according to the MAX(Id) – PKK Jan 03 '12 at 14:18
  • This is not correct. You will indeed get the maximum id for each member, but both the TransactionDate and Balance will be chosen completely random between each members records. Writing a query like this is not safe. – Andreas Jan 03 '12 at 14:33
  • I'm afraid PKK is correct. I've edited this to give you another option. – Umbrella Jan 03 '12 at 14:35
  • @Umbrella it works with the IN but it is slow (just as with the =). The answer from nikc worked nicely and fast – PKK Jan 03 '12 at 14:43
0

Other option is to lookup for NULL values in a left join:

SELECT m1.*
  FROM money m1
  LEFT JOIN money m2 ON m2.memberId = m1.memberId AND m2.id > m1.id
 WHERE m2.memberId IS NULL

But of course Umbrella's answer is better.

Community
  • 1
  • 1
Benoit
  • 76,634
  • 23
  • 210
  • 236