query efficiency - select the 2 latest “group/batch” records from table

Question

We have a tested a quite interesting SQL query. Unfortunately, It turned out that this query runs a little bit slow - O(n2) - and we are looking for a optimized solution or maybe also a totally different one?

Goal:

We would like to get for:
 - some customers ("record_customer_id"), e.g. ID 5
     - the latest 2 "record_init_proc_id" 
       - for every "record_inventory_id"

http://www.sqlfiddle.com/#!9/07e5d/4

The query works fine and shows the correct results but uses at least two full table scans which is of course horrible when having too many rows to scan.

Is it possible to (a) select first all record_customer_id(s) in a specific date range, saving the results of that query and then (b) run the working query in SQL fiddle over these saved results?

Or is there maybe a totally different approach out there which has a great performance?

Any help is greatly appreciated!

There's no guarantee the variables in your select query are going to be executed in the order you desire. `@g:= t.record_customer_id g,` may be evaulated before, or after the `case` statement. — pala_, May 20 '15 at 07:45
You have no indexes defined in that sqlfiddle - is it the same for the real table? If yes then adding some might really help you. — jkavalik, May 20 '15 at 07:54
@user1786423: Yes, there are indexes defined.The query performs two table scans. This should be avoided. — cahro88, May 20 '15 at 08:23
as i said, there is no guarantee. it may be, or it may not be. [see here](https://dev.mysql.com/doc/refman/5.0/en/user-variables.html) — pala_, May 20 '15 at 08:27
you will probably need to implement something like [tag:greatest-n-per-group] — jkavalik, May 20 '15 at 08:41
@pala: I guess this would not fix the inefficiency. I would rather focus on a major solution. — cahro88, May 20 '15 at 09:02
Just do `CREATE INDEX record_customer_id ON test2( record_customer_id );` in your example it reduces a number of examined rows by 50% from 72 to 35, compare plans in your demo: http://www.sqlfiddle.com/#!9/07e5d/4 and the same but with the index created: http://www.sqlfiddle.com/#!9/265b2/1 — krokodilko, May 20 '15 at 17:39
Thanks for that idea. In the real DB indexes already exist. Unfortunately, it is still a quadratic function O(n2). If there are 500K or >1M rows, it would be a horrible performance. Do you have a better idea to do a different query, e.g. select all rows where customer_id IN (..) and do the existing query with that subset maybe? — cahro88, May 20 '15 at 19:27

score 0 · Answer 1 · answered Jun 04 '15 at 23:31

0

I give a generic O(N) solution in my Groupwise-Max blog

answered Jun 04 '15 at 23:31

Rick James

135,179
13
127
222

query efficiency - select the 2 latest “group/batch” records from table

1 Answers1

Linked