6

I'm new to query optimizations so I accept I don't understand everything yet but I do not understand why even this simple query isn't optimized as expected.

My table:

+------------------+-----------+------+-----+-------------------+----------------+
| Field            | Type      | Null | Key | Default           | Extra          |
+------------------+-----------+------+-----+-------------------+----------------+
| tasktransitionid | int(11)   | NO   | PRI | NULL              | auto_increment |
| taskid           | int(11)   | NO   | MUL | NULL              |                |
| transitiondate   | timestamp | NO   | MUL | CURRENT_TIMESTAMP |                |
+------------------+-----------+------+-----+-------------------+----------------+

My indexes:

+-----------------+------------+-------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table           | Non_unique | Key_name          | Seq_in_index | Column_name      | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+-------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| tasktransitions |          0 | PRIMARY           |            1 | tasktransitionid | A         |         952 |     NULL | NULL   |      | BTREE      |         |               |
| tasktransitions |          1 | transitiondate_ix |            1 | transitiondate   | A         |         952 |     NULL | NULL   |      | BTREE      |         |               |
+-----------------+------------+-------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

My query:

SELECT taskid FROM tasktransitions WHERE transitiondate>'2013-09-31 00:00:00';

gives this:

+----+-------------+-----------------+------+-------------------+------+---------+------+------+-------------+
| id | select_type | table           | type | possible_keys     | key  | key_len | ref  | rows | Extra       |
+----+-------------+-----------------+------+-------------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | tasktransitions | ALL  | transitiondate_ix | NULL | NULL    | NULL | 1082 | Using where |
+----+-------------+-----------------+------+-------------------+------+---------+------+------+-------------+

If I understand everything correctly Using where and ALL means that all rows are retrieved from the storage engine and filtered at server layer. This is sub-optimal. Why does it refuse to use the index and only retrieve the requested range from the storage engine (innoDB)?

Cheers

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
Tin
  • 699
  • 1
  • 6
  • 19

4 Answers4

10

MySQL will not use the index if it estimates that it would select a significantly large portion of the table, and it thinks that a table-scan is actually more efficient in those cases.

By analogy, this is the reason the index of a book doesn't contain very common words like "the" -- because it would be a waste of time to look up the word in the index and find the list of page numbers is a very long list, even every page in the book. It would be more efficient to simply read the book cover to cover.

My experience is that this happens in MySQL if a query's search criteria would match greater than 20% of the table, and this is usually the right crossover point. There could be some variation based on the data types, size of table, etc.

You can give a hint to MySQL to convince it that a table-scan would be prohibitively expensive, so it would be much more likely to use the index. This is not usually necessary, but you can do it like this:

SELECT taskid FROM tasktransitions FORCE INDEX (transitiondate_ix)
WHERE transitiondate>'2013-09-31 00:00:00';
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Thanks @Bill, this seems indeed to be the reason. The query returns 186 rows out of 987 rows. When I change the date such that only 10 rows match, it uses the index. What I still don't understand is that `using where` is in explain's output when the key is being used. – Tin Nov 04 '13 at 18:07
  • 2
    `Using where` simply means that the SQL execution is applying a condition to filter out some rows. In other words, not all rows examined are returned. – Bill Karwin Nov 04 '13 at 18:11
  • I'm reading the book High Performance MySql from Baron Schwartz. According to this book `Using where` means that too many rows are read from the storage engine and a better suited index will probably solve this. But I don't understand how to. – Tin Nov 04 '13 at 18:30
  • To be correct: sometimes a better suited index is available when `using where` is reported. – Tin Nov 04 '13 at 18:33
  • In your example you're searching only a single column, and it's already indexed. So there's no other opportunity to add an index that would be relevant. You're just searching for a range of dates that encompasses a significant part of the table. – Bill Karwin Nov 04 '13 at 18:34
  • Also read the description of `Using where` in http://dev.mysql.com/doc/refman/5.6/en/explain-output.html – Bill Karwin Nov 04 '13 at 18:34
1

I once was trying to join two tables and MySQL was refusing to use an index, resulting in >500ms queries, sometimes a few seconds. Turns out the column I was joining on had different encodings on each table. Changing both to the same encoding sped up the query to consistently less than 100ms.

AlexDev
  • 4,049
  • 31
  • 36
0

Just in case, it helps somebody.

I have a table with a varchar column _id (long int coded as string). I added an index for this column, but query was still slow. I was executing this query:

select * from table where (_id = 2221835089) limit 1

I realized that the _id column wasn't been generated as string (I'm Laravel as DB framework). Well, if query is executed with the right data type in the where clause everything worked like a charm:

select * from table where (_id = '2221835089') limit 1

Dharman
  • 30,962
  • 25
  • 85
  • 135
vegetable
  • 313
  • 3
  • 13
  • Thank you. This did indeed help with a seemingly trivial query refusing to use an index - even if FORCEd. – DaveW Oct 14 '22 at 17:22
0

I am new at my MySQL 8.0, have finished 2 simple tutorials completely, and there is only two subjects that has not worked for me, one of them is indexing. I read the section labeled "2 Answers" and found that using the statement suggested at the end of said section, seems to defeat the purpose of the original USE INDEX or FORCE INDEX statement below. The suggested statement is like getting a table sorted via a WHERE statement instead of MySQL using USE INDEX or FORCE INDEX. It works, but seems to me it is not the same as using the natural USE INDEX or FORCE INDEX. Does any one knows why MySQL is ignoring my simple request to index a 10 row table on the Lname column?

Field Type Null Key Default Extra
ID int NO PRI Null auto_increment
Lname varchar(20) NO MUL Null
Fname varchar(20) NO Mul Null
City varchar(15) NO Null
Birth_Date date NO Null
CREATE INDEX idx_Lname ON TestTable (Lname);
SELECT * FROM TestTable USE INDEX (idx_Lname);
SELECT * From Testtable FORCE INDEX (idx_LastFirst); 
weirdan
  • 2,499
  • 23
  • 27