Join table on itself - performance

Question

I would like some help with the following join. I have one table (with about 20 million rows) that consists of:

MemberId (Primary Key) | Id (Primary Key) | TransactionDate | Balance

I would like to get the latest Balance for all the customers in one query. I know I could do something like this (I just wrote it from my memory). But this way is terribly slow.

SELECT * 
FROM money 
WHERE money.Id = (SELECT MAX(Id) 
                  FROM money AS m 
                  WHERE m.MemberId = money.MemberId)

Are there any other (faster/smarter) options?

To be sure, the primary key is composite, and the first column is MemberId? — Benoit, Jan 03 '12 at 14:12
Umbrella has a good answer, but I would be really interested to find out how MySQL deals with the `IN` operator (replace the first `=`), if you don't mind. — Jonathan Dickinson, Jan 03 '12 at 14:24
@Benoit Yes the primary key is composite and the first column is MemberId (It's a MyIsam table) — PKK, Jan 03 '12 at 14:28
Thanks, good to know. I saw there are some query optimizer improvements in the [6.0 experimental](http://forge.mysql.com/wiki/6.0_Subquery_Optimization_Benchmarks) - in theory (the theory being how MsSQL behaves :)) `IN` should be faster; so keep it mind for when MySQL catches up - because it's the 'right' way to be thinking in SQL terms (think in sets and not rows). SQL gets its bad performance name from people using it with a 'row' mindset. — Jonathan Dickinson, Jan 03 '12 at 15:04

score 5 · Accepted Answer · answered Jan 03 '12 at 14:14

5

In all optimization tutorials and screencasts that I've endured through, joins are always favoured over subqueries. When using a sub-query the sub-query is executed for each comparison, where as with a join only once.

SELECT * 
FROM money m
INNER JOIN (
    SELECT memberId, MAX(id) AS maxid
    FROM money
    GROUP BY memberId
) mmax ON mmax.maxid = m.id AND mmax.memberId = m.memberId

answered Jan 03 '12 at 14:14

nikc.org

16,462
6
50
83

Really nice solution. It was much much faster than the query I used – PKK Jan 03 '12 at 14:46

Umbrella · Answer 2 · 2012-01-03T14:34:55.477

2

JOINing is not the best way to go about this. Consider using a GROUP BY clause to sift out the last transaction for each member, like this:

~~SELECT MemberId, MAX(Id), TransactionDate, Balance FROM money GROUP BY MemberId~~

UPDATE

as PKK pointed out, balance will be chosen randomly. It looks like you'll have to perform some sort of join after all. Consider this option:

SELECT MemberId, Id, TransactionDate, Balance FROM money WHERE Id IN (
    SELECT MAX(Id) FROM money GROUP BY MemberId
)

edited Jan 03 '12 at 14:34

answered Jan 03 '12 at 14:12

Umbrella

4,733
2
22
31

Edited to get the Max Id instead of Date – Umbrella Jan 03 '12 at 14:14
TransactionDate and Balance will be chosen randomly. Not according to the MAX(Id) – PKK Jan 03 '12 at 14:18
This is not correct. You will indeed get the maximum id for each member, but both the TransactionDate and Balance will be chosen completely random between each members records. Writing a query like this is not safe. – Andreas Jan 03 '12 at 14:33
I'm afraid PKK is correct. I've edited this to give you another option. – Umbrella Jan 03 '12 at 14:35
@Umbrella it works with the IN but it is slow (just as with the =). The answer from nikc worked nicely and fast – PKK Jan 03 '12 at 14:43

score 0 · Answer 3 · edited May 23 '17 at 11:44

0

Other option is to lookup for NULL values in a left join:

SELECT m1.*
  FROM money m1
  LEFT JOIN money m2 ON m2.memberId = m1.memberId AND m2.id > m1.id
 WHERE m2.memberId IS NULL

But of course Umbrella's answer is better.

edited May 23 '17 at 11:44

Community

1
1

answered Jan 03 '12 at 14:11

Benoit

76,634
23
210
236

Join table on itself - performance

3 Answers3

Linked