6

This is my sample query

Select table1.id 
from table1 
where table.id in (select table2.id 
                    from table2 
                    where table2.id in (select table3.id 
                                        from table3)
                    ) 
order by table1.id  
limit 100

On checking the optimiser trace for the above query. Optimiser trace cost

  1. DUPLICATE-WEEDOUT strategy - Cost: 1.08e7
  2. FIRST MATCH strategy - Cost: 1.85e7

As DUPLICATE-WEEDOUT cost is less, mysql took DUPLICATE-WEEDOUT strategy for the above query.

Seems everything good in join_optimization part right. But finally, after checking the join_execution part. DUPLICATE-WEEDOUT usually creates temp table. But here as the heap-size is not enough for temp table, it went on creating ondisk temp table(converting_tmp_table_to_ondisk).

Due to disk temp table my query execution became slower.


So what happened here?

Optimiser trace doesn't calculate the cost of disk table in join-optimisation part itself. If disk table cost was calculated, it would be higher than first match. Then final_semijoin_strategy would be FIRST-MATCH strategy, with this my query would have been faster.

Is there any way MYSQL calculate the cost of disk table in join-optimisation part itself or any other work around for this particular issue?

MYSQ-5.7, INNODB


Note: This is a very dynamic query where multiple condition will add based on request in query. So I have done optimising the query in all possible manner. And finally stuck with this disk table cost issue. Kindly avoid optimising the query(like changing the query structure, forcing first-match strategy). And for increasing the heap size(Im not sure much about it, in different forum many said it might bring different issue in other queries)

vinieth
  • 1,204
  • 3
  • 16
  • 34
  • Given that cost doesn't directly correlate with performance, how do you know that, even if you could get the optimiser to give you the results that you want, that the query would run noticeably faster? Also, why don't you just inner join your 3 tables rather than use IN ? – NickW Dec 14 '20 at 09:48
  • Have you over-simplified the query? I assume the _real_ question is "how can I speed up this query". – Rick James Dec 15 '20 at 20:11

1 Answers1

0

IN( SELECT ... ) has been notoriously inefficient. Try to avoid it.

The query, as presented, is probably equivalent to

SELECT  t1.id
    FROM  t1
    JOIN  t2 USING(id)
    JOIN  t3 USING(id)
    ORDER BY  id
    LIMIT  100

which will optimize nicely.

This formulation should not need to build any temp table, much less a disk-based one.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Yes, its will be equivalent to above the query. But if use JOIN instead of IN, I need group by to remove duplicate. Which makes query execution really bad for our case, so i went with IN flow. – vinieth Dec 16 '20 at 04:36
  • Is `id` the `PRIMARY KEY` of each table? If not, please provide `SHOW CREATE TABLE` for each table. – Rick James Dec 16 '20 at 05:38
  • yes every ID, which i used is a primary key. – vinieth Dec 16 '20 at 05:58
  • @vinieth - Then the tables are 1:1. What would you be `GROUPing BY`? – Rick James Dec 16 '20 at 07:00
  • oops sry. table2.id and table3.id is part of a composite pk – vinieth Dec 16 '20 at 08:23
  • 1
    So, the output might be 100 copies of the same t1.id? Please start over -- with the real query and real `SHOW CREATE TABLEs`. I suspect you have simplified it to the point where we can't give you a straight answer _and_ it probably won't apply to the original query. – Rick James Dec 21 '20 at 21:43