mysql difference in number of rows scanned

Question

I have a query :

mysql> explain SELECT  *
    FROM  OTHERS_TINY_URL_TBL
    WHERE  LINK_TYPE = 'BITLY'
      AND  URL_SHORTNER_ID = '5434e3b9e4b03aa06f25da11'
      AND  MODIFIED_TM >= '2016-03-13 21:09:43'
      AND  MODIFIED_TM <= '2017-03-13 21:09:43'
      AND  POST_ID < 0
    ORDER BY  MODIFIED_TM DESC
    LIMIT  1000\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: OTHERS_TINY_URL_TBL
         type: range
possible_keys: XIE1_OTHERS_TINY_URL_TBL,XIE2_OTHERS_TINY_URL_TBL,XIE5_OTHERS_TINY_URL_TBL
          key: XIE5_OTHERS_TINY_URL_TBL
      key_len: 4
          ref: NULL
         rows: 47168
        Extra: Using index condition; Using where
1 row in set (0.00 sec)

ERROR: 
No query specified

I don't get why it is scanning so many rows. It seems ORDER BY and LIMIT is making it inefficient

Without ORDER BY and LIMIT:

mysql> explain SELECT  *
    FROM  OTHERS_TINY_URL_TBL
    WHERE  LINK_TYPE = 'BITLY'
      AND  URL_SHORTNER_ID = '5434e3b9e4b03aa06f25da11'
      AND  MODIFIED_TM >= '2016-03-13 21:09:43'
      AND  MODIFIED_TM <= '2017-03-13 21:09:43'
      AND  POST_ID < 0\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: OTHERS_TINY_URL_TBL
         type: range
possible_keys: XIE1_OTHERS_TINY_URL_TBL,XIE2_OTHERS_TINY_URL_TBL,XIE5_OTHERS_TINY_URL_TBL
          key: XIE2_OTHERS_TINY_URL_TBL
      key_len: 9
          ref: NULL
         rows: 4950
        Extra: Using index condition; Using where
1 row in set (0.00 sec)

ERROR: 
No query specified

Only 85 rows in the filtered set:

mysql>  SELECT  count(*)
    FROM  OTHERS_TINY_URL_TBL
    WHERE  LINK_TYPE = 'BITLY'
      AND  URL_SHORTNER_ID = '5434e3b9e4b03aa06f25da11'
      AND  MODIFIED_TM >= '2016-03-13 21:09:43'
      AND  MODIFIED_TM <= '2017-03-13 21:09:43'
      AND  POST_ID < 0\G;
*************************** 1. row ***************************
count(*): 85
1 row in set (0.02 sec)

ERROR: 
No query specified

Creating a nested query scanned on lesser number of rows:

mysql> explain SELECT  *
    FROM  
    (
        SELECT  *
            from  OTHERS_TINY_URL_TBL
            where  URL_SHORTNER_ID = '5434e3b9e4b03aa06f25da11'
              AND  MODIFIED_TM >= '2016-03-13 21:09:43'
              AND  MODIFIED_TM <= '2017-03-13 21:09:43'
              AND  POST_ID < 0
    ) inner_t
    where  true
    ORDER BY  MODIFIED_TM DESC
    LIMIT  1000\G;
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 4950
        Extra: Using filesort
*************************** 2. row ***************************
           id: 2
  select_type: DERIVED
        table: OTHERS_TINY_URL_TBL
         type: range
possible_keys: XIE2_OTHERS_TINY_URL_TBL,XIE5_OTHERS_TINY_URL_TBL
          key: XIE2_OTHERS_TINY_URL_TBL
      key_len: 9
          ref: NULL
         rows: 4950
        Extra: Using index condition; Using where
2 rows in set (0.00 sec)

ERROR: 
No query specified

Why is the first query inefficient?

Please provide `SHOW CREATE TABLE` -- How are we supposed to know what the chosen indexes are! — Rick James, Mar 30 '17 at 22:19

score 2 · Answer 1 · answered Mar 28 '17 at 11:05

First of all, it is important to note that the number in rows is just an estimate, based on some statistical data, and, depending on your MySQL version, some random lookups into your index during planning phase. It can be completely wrong sometimes (and, based on these estimates, MySQL might choose a slower execution plan). That is why, if you know something about your data that MySQL doesn't, it is possible to optimize queries by e.g. forcing an index that MySQL does not want to choose.

Also, the only relevant value that MySQL tries to minimize is the overall execution time. It might correlate with the number of rows it has to read, but doesn't have to (e.g. if you don't have to do a filesort). And even if it generally might choose a good strategy, it can depend on your actual data if it actually is the fastest. Try e.g. limit 10 to simulate that instead of 85, 4000 rows would fulfill your search condition. This would solely depend on your data, not the query itself - and MySQL has to decide how to execute the query before it executes the query.

But in general, it is expected that your queries will have a different number in rows, since your queries are using different indexes.

The first query will use an index on MODIFIED_TM that goes through all rows in the given date range. The estimate is that 47168 rows lie in this range. For all these rows, it will check the other columns for the other conditions.

Your second query will use an index on a different column, probably LINK_TYPE. You would have to add your index definitions to make it clear, and it can e.g. be URL_SHORTNER_ID too, but for the following, I assume the index is (only) on LINK_TYPE. The estimate is that 4950 rows have this LINK_TYPE. This is actually very often a bad estimate, and you can check the correct number by e.g. SELECT count(*) FROM OTHERS_TINY_URL_TBL WHERE LINK_TYPE = 'BITLY'. Every of these rows will then have to be checked if they match the other conditions.

MySQL doesn't know yet that only 85 rows will finally be found, and that it might be faster to check 4950 rows and order the remaining 85 rows than to check 47168 rows that are already in the correct order (and so do not have to be ordered afterwards, which is a relatively slow operation which MySQL usually tries to prevent). Also you might get lucky: it can happen that the first 1000 of these 47168 rows already have to correct link_type, url_shortener_id and post_id, so it would have to only inspect 1000 rows instead of checking 4950 rows plus doing the sorting. But that will depend only on your data. If you know something about your data that MySQL doesn't, you should use optimizer hints to e.g. force a different index. Or trick MySQL e.g. by using your third query - although MySQL 5.7 might trick you back, because it should actually optimize the subquery away.

Fortunately, there is a better solution. There is a perfect index for your first query (which will also improve your second query at the same time): OTHERS_TINY_URL_TBL(URL_SHORTNER_ID, LINK_TYPE, MODIFIED_TM).

mysql difference in number of rows scanned

1 Answers1