2

I have the following query:

select *
from 
  `twitter_posts` 
where 
  `main_handle_id` in (
    select 
      `twitter`.`main_handle_id` 
    from 
      `users` 
      inner join `twitter` on `twitter`.`user_id` = `user`.`id` 
    where 
      `users` LIKE 'foo'
  ) 
order by created_at

This query runs dramatically slow when the order by is used. The twitter_posts table has indexes on the created_at timestamp and on the main_handle_id column.

I used "explain" to see what the engine was planning on doing and noticed a filesort... which I was not expecting to see as an index exists on the twitter_posts table. After skimming through the various posts covering this on Stackoverflow and blogs, none of the examples worked for me. I had a hunch, that maybe, the sql optimizer got confused somehow with the nested select. So I wrapped the query and ordered the result of that by created_at:

select * (select *
from 
  `twitter_posts` 
where 
  `main_handle_id` in (
    select 
      `twitter`.`main_handle_id` 
    from 
      `users` 
      inner join `twitter` on `twitter`.`user_id` = `user`.`id` 
    where 
      `users` LIKE 'foo'
  ) 
)
order by created_at

Using explain on this one, DOES use the index to sort and returns the response in a fraction of what it takes with filesort. So I've "solved" my problem, but I don't understand why SQL would make a different choice based on the solution. As far as I can see, I'm just wrapping the result and ask everything of that result in a seemingly redundant statement...

Edit: I'm still unaware of why the optimizer started using the index on the created_at when using a redundant subquery, but in some cases it wasn't faster. I've now gone with the solution to add the statement "FORCE INDEX FOR ORDER BY created_at_index_name".

Rick James
  • 135,179
  • 13
  • 127
  • 222
jlos
  • 1,010
  • 1
  • 8
  • 12
  • Have you tried using force index ? – Praveen E Dec 27 '17 at 14:04
  • I have yea, but that didn't work either + it's a bit of a dirty solution... Although the redundant query is as well :) So my question isn't how can I optimize this, the solution works. My question is why. I think the answer might very well depend on the MariaDB version I use. I'm not aware of any significant optimization changes between MySQL and Maria... – jlos Dec 27 '17 at 14:30
  • @PraveenE I've updated my answer, working with a redundant select causes a temp table to be created which is in some queries (depends on the inner where statements) makes it slow as well. – jlos Dec 27 '17 at 16:11
  • Now I remember something. It looks like we are misunderstanding how IN condition is working. I will check and come back if I find it – Praveen E Dec 27 '17 at 16:27
  • There is one case where the following happened and I heard in my engineering team that it must be a limitation in Mysql. Mysql optimiser will query on twitter_posts first and then it will query on your inner query next. We might expect that it will query on your inner query first and then using the result of the inner query it would filter the twitter_posts table. This may occur if the query is having IN / NOT IN. Please check your Explain output and let me know if execution order is responsible for your slowness. If my answer solves your problem I will post it as answer @JanVansteenlandt – Praveen E Dec 27 '17 at 16:40
  • I'm running through my use cases, but it's still not consistent, sometimes it's very fast, sometimes it's still very slow. I'll make sure to follow up as soon as I've got my answer. – jlos Dec 27 '17 at 16:42
  • I may get some clue if you can post explain of both your queries – Praveen E Dec 27 '17 at 16:47
  • I would, but they're both the same ;) It's not the original query itself that is now running slow, but when I add an additional conditional clause in the inner where statement, that additional where statement goes really slow. It's not visible in the explain output, as adding or removing inner where statements don't change anything to the explain output. I think it's because it doesn't use any other optimizations anymore besides the forced index, which causes the additional inner where statements to be not optimized... or at least it feels like that's the case. – jlos Dec 27 '17 at 16:56
  • @PraveenE the query I posted has its inner where statement as a variable, it seems that if the "base" version of that inner where statement remains untouched the forced index works perfectly as a speed boost. From the moment that inner where statement gets elaborated on (i.e. users.name LIKE 'bob') the query slows down dramatically. The (extended) explain output is exactly the same, so I couldn't make anything up from that. I'm going to go with a code solution, where if the inner where statements is the base one, use the forced index, if not, don't use it as it screws over the performance 1/2 – jlos Dec 27 '17 at 17:17
  • of the inner where statement apparently. This remains an enigma as to why though :D But I've got a solution and I'll roll with it, maybe someone more knowledgeable than me about optimizations in queries (MariaDB) can solve this on ^^ – jlos Dec 27 '17 at 17:18

1 Answers1

1

Don't use IN ( SELECT ... ); convert that to a JOIN. Or maybe converting to EXISTS ( SELECT ... ) may optimize better.

Please provide the EXPLAINs. Even if they are both the same. Provide EXPLAIN FORMAT=JSON SELECT ... if your version has such.

If the Optimizer is confused by a nested query, why add another nesting??

Which table is users in?

You mention user.id, yet there is no user table in FROM or JOIN.

What version of MySQL (or MariaDB) are you using? 5.6 and especially 5.7 have significant differences in the optimizer, with respect to MariaDB.

Please provide SHOW CREATE TABLE for each table mentioned in the query.

There are hints in your comments that the queries you presented are not the ones giving you trouble. Please do not ask about a query that does not itself demonstrate the problem.

Oh, is it users.name? Need to see the CREATE TABLEs to see if you have a suitable index.

Since MariaDB was mentioned, I am adding that tag. (This is probably the only way to get someone with specific MariaDB expertise involved.)

Rick James
  • 135,179
  • 13
  • 127
  • 222