0

We have a tested a quite interesting SQL query. Unfortunately, It turned out that this query runs a little bit slow - O(n2) - and we are looking for a optimized solution or maybe also a totally different one?

Goal:

We would like to get for:
 - some customers ("record_customer_id"), e.g. ID 5
     - the latest 2 "record_init_proc_id" 
       - for every "record_inventory_id"

http://www.sqlfiddle.com/#!9/07e5d/4

The query works fine and shows the correct results but uses at least two full table scans which is of course horrible when having too many rows to scan.

Is it possible to (a) select first all record_customer_id(s) in a specific date range, saving the results of that query and then (b) run the working query in SQL fiddle over these saved results?

Or is there maybe a totally different approach out there which has a great performance?

Any help is greatly appreciated!

cahro88
  • 7
  • 4
  • There's no guarantee the variables in your select query are going to be executed in the order you desire. `@g:= t.record_customer_id g,` may be evaulated before, or after the `case` statement. – pala_ May 20 '15 at 07:45
  • You have no indexes defined in that sqlfiddle - is it the same for the real table? If yes then adding some might really help you. – jkavalik May 20 '15 at 07:54
  • @user1786423: Yes, there are indexes defined.The query performs two table scans. This should be avoided. – cahro88 May 20 '15 at 08:23
  • @pala: The query works just fine however. – cahro88 May 20 '15 at 08:25
  • as i said, there is no guarantee. it may be, or it may not be. [see here](https://dev.mysql.com/doc/refman/5.0/en/user-variables.html) – pala_ May 20 '15 at 08:27
  • @pala:: Okay, do you have a query idea we can use instead? – cahro88 May 20 '15 at 08:29
  • you will probably need to implement something like [tag:greatest-n-per-group] – jkavalik May 20 '15 at 08:41
  • @pala: I guess this would not fix the inefficiency. I would rather focus on a major solution. – cahro88 May 20 '15 at 09:02
  • Just do `CREATE INDEX record_customer_id ON test2( record_customer_id );` in your example it reduces a number of examined rows by 50% from 72 to 35, compare plans in your demo: http://www.sqlfiddle.com/#!9/07e5d/4 and the same but with the index created: http://www.sqlfiddle.com/#!9/265b2/1 – krokodilko May 20 '15 at 17:39
  • Thanks for that idea. In the real DB indexes already exist. Unfortunately, it is still a quadratic function O(n2). If there are 500K or >1M rows, it would be a horrible performance. Do you have a better idea to do a different query, e.g. select all rows where customer_id IN (..) and do the existing query with that subset maybe? – cahro88 May 20 '15 at 19:27

1 Answers1

0

I give a generic O(N) solution in my Groupwise-Max blog

Rick James
  • 135,179
  • 13
  • 127
  • 222