0

I am having slow return times based on amount of data for the following query.

mysql> explain select  *
    from  worker_location
    where  gate_id not in (
        SELECT  gate_id from  worker_address
                          );
+----+--------------------+---------------------+-------+---------------------------+---------------------------+---------+------+---------+--------------------------+
| id | select_type        | table               | type  | possible_keys             | key                       | key_len | ref  | rows    | Extra                    |
+----+--------------------+---------------------+-------+---------------------------+---------------------------+---------+------+---------+--------------------------+
|  1 | PRIMARY            | worker_location     | ALL   | NULL                      | NULL                      | NULL    | NULL |  527347 | Using where              |
|  2 | DEPENDENT SUBQUERY | worker_address      | index | gate_id_idx               | gate_id_ix                | 48      | NULL | 3041342 | Using where; Using index |
+----+--------------------+---------------------+-------+---------------------------+---------------------------+---------+------+---------+--------------------------+
2 rows in set (0.00 sec)

I tried using a left join but am getting the same plan and no benefit in speed.

mysql> explain select  *
    from  worker_location wl
    left join  worker_address wa  ON wl.gate_id=wa.gate_id
    where  wa.gate_id is null;
+----+-------------+-------+------+---------------+------+---------+------+---------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra                                              |
+----+-------------+-------+------+---------------+------+---------+------+---------+----------------------------------------------------+
|  1 | SIMPLE      | wl    | ALL  | NULL          | NULL | NULL    | NULL |  527347 | NULL                                               |
|  1 | SIMPLE      | wa    | ALL  | NULL          | NULL | NULL    | NULL | 3041342 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+------+---------+------+---------+----------------------------------------------------+
2 rows in set (0.00 sec)

Is there a way to optimize this query any further?

Rick James
  • 135,179
  • 13
  • 127
  • 222
user3299633
  • 2,971
  • 3
  • 24
  • 38
  • 1
    Make sure to select only the columns that you need. Don't use * and fetch unnecessary columns that you don't need. – Channaveer Hakari Sep 27 '18 at 21:22
  • All columns are needed. – user3299633 Sep 27 '18 at 21:25
  • 3
    Do you have index for `worker_location.gate_id` column ? – Oto Shavadze Sep 27 '18 at 21:40
  • Yes all columns are indexed. – user3299633 Sep 27 '18 at 22:14
  • Not much can be done. Could try playing with increasing the session [join_buffer_size](https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_join_buffer_size), and [BNL](https://dev.mysql.com/doc/refman/5.7/en/nested-loop-joins.html) and [BKA](https://dev.mysql.com/doc/refman/5.7/en/bnl-bka-optimization.html) – danblack Sep 28 '18 at 00:07
  • Try : select * from worker_location where not exists (select * from worker_address where worker_address.gate_id = worker_location.gate_id); – DanB Sep 28 '18 at 03:11
  • Can you share your indexes? How large is your dataset? What is your definition of poor performance? – Derrick Moeller Oct 03 '18 at 19:25

2 Answers2

0

see in your explain output two issues:

no indexesbeing used - see 'possible_keys'& 'key' columns

'Block Nested Loop' optimization triggered which could possibly hit a bag

you can try playnig with index hints

and/or try to disable bnl by: SET SESSION optimizer_switch='block_nested_loop=off';

Dmytro Kh.
  • 134
  • 1
  • 12
0

Some analysis:

  • You necessarily need to touch about 527347 rows of the first table.
  • For each of those it checks the other table.
  • Why is gate_id so big? 48 bytes??
  • The first query (NOT IN) uses the index ("Using index"), hence reasonably efficient for 527347 random lookups.
  • The second query (LEFT JOIN) loads the entire index. This may be more efficient than reaching into the table 527347 times, depending on what is cached.

A third approach:

select  *
    from  worker_location AS wl
    where  NOT EXISTS ( SELECT 1 FROM worker_address WHERE gate_id = wl.gate_id );

How long does each variant take?

For further discussion, please provide SHOW CREATE TABLE for both tables and the value of innodb_buffer_pool_size. This may lead to other optimization techniques.

Rick James
  • 135,179
  • 13
  • 127
  • 222