2

I've seen a few questions dabbling with the inefficiency of "NOT IN" in MySQL queries, but I didn't manage to reproduce the proposed solutions.

So I've got some sort of search engine. It starts with very simple queries, and then tries more complicated ones if it doesn't find enough results. Here's how it works in pseudocode

list_of_ids = do_simple_search()
nb_results = size_of(list_of_ids)

if nb_results < max_nb_results :
    list_of_ids .= do_search_where_id_not_in(list_of_ids)

    if nb_results < max_nb_results :
         list_of_ids .= do_complicated_search_where_id_not_in(list_of_ids)

Hope I'm clear. Anyway here's the slow query, as shown by MySQL-slow :

SELECT DISTINCT c.id 
FROM clients c LEFT JOIN communications co ON c.id = co.client_id 
WHERE (co.titre LIKE 'S' OR co.contenu LIKE 'S') AND c.id NOT IN(N)
LIMIT N, N

And here's an EXPLAIN on that query :

id  select_type     table   type    possible_keys               key         key_len ref             rows    Extra
1   SIMPLE          c       index   PRIMARY                     PRIMARY     2       NULL            25250   Using where; Using index; Using temporary
1   SIMPLE          co      ref     qui_com,id_client,titre     id_client   2       klients.c.id    8       Using where; Distinct

MySQL version is 5.1.63-0ubuntu0.11.04.1-log

Maybe my approach is wrong here ? How would you do it ? thanks.

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
Bootdisk
  • 75
  • 6
  • are you having performance problems? – serakfalcon Jun 05 '14 at 14:24
  • 1
    If you want us to help optimize a query, **you need to show us the table and index definitions**, as well as row counts for each of the tables. Maybe your tables are defined poorly. Maybe the indexes aren't created correctly. Maybe you don't have an index on that column you thought you did. Without seeing the table and index definitions, we can't tell. We also need row counts because that can affect query optimization greatly. If you know how to do an `EXPLAIN` or get an execution plan, put the results in the question as well. If you have no indexes, visit http://use-the-index-luke.com ASAP. – Andy Lester Jun 05 '14 at 15:26
  • Thanks for the heads up, I'll try to figure out a bit by myself and then come back – Bootdisk Jun 06 '14 at 12:49

1 Answers1

1

A couple of remarks:

1) Why do you do LEFT JOIN i/o (INNER) JOIN? LEFT JOIN means that you want to also get the records which are not matched agains clients, is that the intention? If not, then JOIN i/o LEFT JOIN is quicker.

2) Why do you need JOIN at all if you can simply do:

SELECT DISTINCT co.client_id from communications co 
WHERE (co.titre LIKE 'S' OR co.contenu LIKE 'S') AND co.id!=N LIMIT N,N;

Also, if you do a JOIN, both joined fields must be indexes, otherwise it's slow too.

More importantly, you condition both client_id and id from communications table, but there is no common index for these both, which means more work to execute your query (hence using temporary which is in general not a good sign).

3) You do a complex condition on both co.titre and co.contenu, you seem to have indexes but they are not used. That means this part might be potentially quite slow.

Ashalynd
  • 12,363
  • 2
  • 34
  • 37
  • Oh wow, I can't believe I overlooked that I didn't need that join. Thanks a lot, that's indeed simpler. – Bootdisk Jun 05 '14 at 14:43
  • 1
    Damn, I just realized why I had to do the JOIN. In some cases, I'm checking for the clients' status too. So you suggest I replace my LEFT JOINs with JOINs. SQL has never been my strongest point :/ – Bootdisk Jun 06 '14 at 11:13
  • I thought something like that. You can still JOIN with the clients table and use fields from there, of course. – Ashalynd Jun 06 '14 at 11:18