4

I am a newbie in MySQL optimization, i found a amazing thing:sql with 'exists' run slower than using 'in' !!!

following is my DDL:

mysql> `show create table order\G`;
*************************** 1. row ***************************
       Table: order
Create Table: CREATE TABLE `order` (
  `id` int(4) NOT NULL AUTO_INCREMENT,
  `employee_id` int(4) NOT NULL,
  `price` decimal(7,2) NOT NULL,
  `trade_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `fk_employee_id` (`employee_id`),
  CONSTRAINT `fk_employee_id` FOREIGN KEY (`employee_id`) REFERENCES `employee` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=231001 DEFAULT CHARSET=utf8



mysql> `show create table order_detail\G`;
*************************** 1. row ***************************
       Table: order_detail
Create Table: CREATE TABLE `order_detail` (
  `id` int(4) NOT NULL AUTO_INCREMENT,
  `menu_id` int(4) NOT NULL,
  `order_id` int(4) NOT NULL,
  `amount` int(4) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `fk_menu_id` (`menu_id`),
  KEY `fk_order_id` (`order_id`),
  CONSTRAINT `fk_menu_id` FOREIGN KEY (`menu_id`) REFERENCES `menu` (`id`),
  CONSTRAINT `fk_order_id` FOREIGN KEY (`order_id`) REFERENCES `order` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1037992 DEFAULT CHARSET=utf8



Query Solution 1: use exists
---------------------------------

mysql> `SELECT count(`order`.id) FROM `order` WHERE EXISTS ( SELECT 1 FROM order_detail WHERE order_detail.order_id = `order`.id GROUP BY (order_detail.order_id) HAVING COUNT(order_id) > 5 );`
+-------------------+
| count(`order`.id) |
+-------------------+
|             92054 |
+-------------------+
1 row in set (***6.53 sec***)

mysql> `explain SELECT count(`order`.id) FROM `order` WHERE EXISTS ( SELECT 1 FROM order_detail WHERE order_detail.order_id = `order`.id GROUP BY (order_detail.order_id) HAVING COUNT(order_id) > 5 )\G;`
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: order
         type: index
possible_keys: NULL
          key: fk_employee_id
      key_len: 4
          ref: NULL
         rows: 231032
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: order_detail
         type: ref
possible_keys: fk_order_id
          key: fk_order_id
      key_len: 4
          ref: performance_test.order.id
         rows: 3
        Extra: Using where; Using index
2 rows in set (0.01 sec)


Query solution 2: use in
------------------------

mysql> `SELECT count(`order`.id) FROM `order` WHERE `order`.id IN ( SELECT order_detail.order_id FROM order_detail GROUP BY (order_detail.order_id) HAVING COUNT(order_id) > 5 ) ;`
+-------------------+
| count(`order`.id) |
+-------------------+
|             92054 |
+-------------------+
1 row in set (***3.88 sec***)

mysql> `explain SELECT count(`order`.id) FROM `order` WHERE `order`.id IN ( SELECT order_detail.order_id FROM order_detail GROUP BY (order_detail.order_id) HAVING COUNT(order_id) > 5 ) \G;`<br>
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: order
         type: index
possible_keys: NULL
          key: fk_employee_id
      key_len: 4
          ref: NULL
         rows: 231032
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 2
  select_type: SUBQUERY
        table: order_detail
         type: index
possible_keys: fk_order_id
          key: fk_order_id
      key_len: 4
          ref: NULL
         rows: 1036314
        Extra: Using index
2 rows in set (0.00 sec)
Arun Killu
  • 13,581
  • 5
  • 34
  • 61
lost_in
  • 75
  • 2
  • 7
  • this is a test project, if you want to reproduce the situation , i can export the test Data and share it – lost_in Nov 24 '13 at 08:16
  • 1
    i think the problem is with dependent sub query – Arun Killu Nov 24 '13 at 08:22
  • 2
    Have you tried to reverse the order of execution: before the IN and after the EXIST, maybe a cached execution plan (from 1st query) that the second query use. – Max Nov 24 '13 at 10:54
  • @Max yes, i have run the two queries again and again,Exists always spend more time than IN – lost_in Nov 24 '13 at 11:42
  • there are only 1,000,000 records in database, both of those two plans take more than 2 seconds to complete, can we optimize it to complete in 1 second – lost_in Nov 24 '13 at 11:55
  • 1
    I think EXISTS query execute a JOIN on outer query consequently there are more read from database (http://sqlmag.com/database-performance-tuning/advanced-join-techniques). IN query instead first execute the subquery (read all recs) after filter the outer query using index. – Max Nov 24 '13 at 13:02
  • 2
    I believe you need to read other similar posts on the subject, here is one http://stackoverflow.com/questions/3999600/mysql-difference-between-in-and-exist it is quite detailed – Noam Rathaus Nov 24 '13 at 13:12

2 Answers2

2

I think you have a little confusion,you are having a wrong idea,'EXISTS' works faster than 'IN' and I'm trying to make you understood the reason..

EXISTS returns a boolean, and will return a boolean on the first match. So if you're dealing with duplicates/multiples, 'EXISTS' will be faster to execute than 'IN' or 'JOIN' depending on the data and the needs.

Whereas, 'IN' is syntactic sugar for OR clauses. While it's very accommodating, there are issues with dealing with lots of values for that comparison (north of 1,000).In case of duplicates/multiples 'IN' checks all the values that exist which naturally consumes more time to execute than 'EXISTS',that's why 'IN' is always comparatively slower than 'EXISTS'.

I hope that I clarified your confusion.

Abhik Dey
  • 403
  • 4
  • 12
  • thanks for your reply , i have googled more material on internet, maybe the essential difference is not 'Exists' and 'IN', but the effect of 'Group By' to the SubQuery, since i have used 'Group By' clause, so the shortcut for 'Exists' not exist, both of those two query have to full scan the table or the index.I fond the output of Explain of those two query is a little different, the Select_Type of SubQuery in 'Exists' is 'Dependent Subquery', but the other is 'SubQuery' – lost_in Nov 27 '13 at 13:33
2

'EXISTS' can work faster than 'IN' if the parent and sub sets are both large.

  • because O(n*log(n)) is faster than O(n*n) for large sets

'EXISTS' can work slower than 'IN' if the sub set is small.

  • because of query structure and resulting query planner changes; eg 30%(1.5s) slower.
user1133275
  • 2,642
  • 27
  • 31