6

There is a typical Users table, which contains fields: id (primary), application_id, login, phone, etc. (application_id - selective field)

There are few indexes:

index_users_on_application_id,
unique_index_users_on_application_id_and_login
unique_index_users_on_application_id_and_phone

The query itself is very simple:

SELECT  `users`.*
FROM `users`
WHERE `users`.`application_id` = 1234
LIMIT 10 OFFSET 0;

The tricky part is that this query uses one of two unique indexes (unique_index_users_on_application_id_and_login for example), and then returns list of users sorted by login. But I need them sorted by id.

For that purpose, I've updated the query:

SELECT  `users`.*
FROM `users`
WHERE `users`.`application_id` = 1234
ORDER BY id
LIMIT 10 OFFSET 0;

Well, now explain shows that MySQL starts using PRIMARY key instead of any indexes. But how did that happen? If index_users_on_application_id should in fact contain two fields: [application_id, id] (InnoDB), so that index is perfect for the query, but MySQL decides to chose another one.

If I say IGNORE INDEX(PRIMARY), MySQL starts using unique_index_users_on_application_id_and_login, still ignoring the correct index. Same result when ORDER BY id+0.

I also tried to ORDER BY application_id, id to make sure index fits the best, MySQL still selects wrong index.

Any ideas, why is it happening and how to ensure MySQL to use proper index without explicitly say USE INDEX(index_users_on_application_id)?

Full list of indexes for Users table:

mysql> show indexes from users;
+-------+------------+-----------------------------------------------------+--------------+----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name                                            | Seq_in_index | Column_name          | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+-----------------------------------------------------+--------------+----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| users |          0 | PRIMARY                                             |            1 | id                   | A         |       21893 |     NULL | NULL   |      | BTREE      |         |               |
| users |          0 | index_users_on_confirmation_token                   |            1 | confirmation_token   | A         |          28 |     NULL | NULL   | YES  | BTREE      |         |               |
| users |          0 | index_users_on_reset_password_token                 |            1 | reset_password_token | A         |          50 |     NULL | NULL   | YES  | BTREE      |         |               |
| users |          0 | index_users_on_application_id_and_external_user_id  |            1 | application_id       | A         |          32 |     NULL | NULL   | YES  | BTREE      |         |               |
| users |          0 | index_users_on_application_id_and_external_user_id  |            2 | external_user_id     | A         |         995 |     NULL | NULL   | YES  | BTREE      |         |               |
| users |          0 | index_users_on_application_id_and_login             |            1 | application_id       | A         |          30 |     NULL | NULL   | YES  | BTREE      |         |               |
| users |          0 | index_users_on_application_id_and_login             |            2 | login                | A         |       21893 |     NULL | NULL   | YES  | BTREE      |         |               |
| users |          1 | users_account_id_fk                                 |            1 | account_id           | A         |          44 |     NULL | NULL   |      | BTREE      |         |               |
| users |          1 | users_blob_id_fk                                    |            1 | blob_id              | A         |         118 |     NULL | NULL   | YES  | BTREE      |         |               |
| users |          1 | index_users_on_remember_token                       |            1 | remember_token       | A         |           8 |     NULL | NULL   | YES  | BTREE      |         |               |
| users |          1 | index_users_on_application_id                       |            1 | application_id       | A         |          32 |     NULL | NULL   | YES  | BTREE      |         |               |
| users |          1 | index_users_on_application_id_and_facebook_id       |            1 | application_id       | A         |          32 |     NULL | NULL   | YES  | BTREE      |         |               |
| users |          1 | index_users_on_application_id_and_facebook_id       |            2 | facebook_id          | A         |        3127 |     NULL | NULL   | YES  | BTREE      |         |               |
| users |          1 | index_users_on_application_id_and_twitter_digits_id |            1 | application_id       | A         |          32 |     NULL | NULL   | YES  | BTREE      |         |               |
| users |          1 | index_users_on_application_id_and_twitter_digits_id |            2 | twitter_digits_id    | A         |         138 |     NULL | NULL   | YES  | BTREE      |         |               |
| users |          1 | index_users_on_application_id_and_email             |            1 | application_id       | A         |          32 |     NULL | NULL   | YES  | BTREE      |         |               |
| users |          1 | index_users_on_application_id_and_email             |            2 | email                | A         |        2189 |     NULL | NULL   | YES  | BTREE      |         |               |
| users |          1 | index_users_on_application_id_and_full_name         |            1 | application_id       | A         |          32 |     NULL | NULL   | YES  | BTREE      |         |               |
| users |          1 | index_users_on_application_id_and_full_name         |            2 | full_name            | A         |        5473 |     NULL | NULL   | YES  | BTREE      |         |               |
+-------+------------+-----------------------------------------------------+--------------+----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
19 rows in set (0.01 sec)

Example of EXPLAIN:

mysql> EXPLAIN SELECT  `users`.* FROM `users` WHERE `users`.`application_id` = 56374  ORDER BY id asc LIMIT 1 OFFSET 0;
+----+-------------+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+---------+-------+------+-----------------------------+
| id | select_type | table | type | possible_keys                                                                                                                                                                                                                                                                                                  | key                                                | key_len | ref   | rows | Extra                       |
+----+-------------+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+---------+-------+------+-----------------------------+
|  1 | SIMPLE      | users | ref  | index_users_on_application_id_and_external_user_id,index_users_on_application_id_and_login,index_users_on_application_id,index_users_on_application_id_and_facebook_id,index_users_on_application_id_and_twitter_digits_id,index_users_on_application_id_and_email,index_users_on_application_id_and_full_name | index_users_on_application_id_and_external_user_id | 5       | const |    1 | Using where; Using filesort |
+----+-------------+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+---------+-------+------+-----------------------------+
1 row in set (0.00 sec)

The problem itself is that using wrong index causes queries like that (with a limit of 100 instead of 1) to be performed MINUTES, while with a correct index it is a matter of split second.

Profiling:

SET PROFILING = 1; SELECT `users`.* FROM `users` WHERE `users`.`application_id` = 56374  ORDER BY id asc LIMIT 1 OFFSET 0; SHOW PROFILE FOR QUERY 1; SET PROFILING = 0;
Query OK, 0 rows affected (0.00 sec)

+----------+----------+-----------------+-------+-------+----------------------------------------------------------------------------------------------------------------------------------+----------------------+-------+---------+---------------------+---------------------+---------------------+------------------+-------------+------------+---------+----------------------+------------------------+---------------------+--------------------+---------------------+----------------------+-------------------+------------+----------------+-------------+---------------------------------------------------------------------------------------+-------------------+-------------------------+--------------------------------------------------+----------------+
-- fields list --
+----------+----------+-----------------+-------+-------+----------------------------------------------------------------------------------------------------------------------------------+----------------------+-------+---------+---------------------+---------------------+---------------------+------------------+-------------+------------+---------+----------------------+------------------------+---------------------+--------------------+---------------------+----------------------+-------------------+------------+----------------+-------------+---------------------------------------------------------------------------------------+-------------------+-------------------------+--------------------------------------------------+----------------+
| 27265241 |     NULL | Some Username | NULL  | 9777  | SomeHash | AnotherHash | NULL  | NULL    | 2017-04-12 15:53:32 | 2017-09-21 13:39:51 | 2017-09-24 19:19:06 |             1234 | NULL        | NULL       |    NULL | NULL                 | NULL                   | NULL                | NULL               | 2017-07-05 10:59:59 | NULL                 | NULL              |      12345 | NULL           | NULL        | something_else | NULL              |                       1 | another_hash |          54321 |
+----------+----------+-----------------+-------+-------+----------------------------------------------------------------------------------------------------------------------------------+----------------------+-------+---------+---------------------+---------------------+---------------------+------------------+-------------+------------+---------+----------------------+------------------------+---------------------+--------------------+---------------------+----------------------+-------------------+------------+----------------+-------------+---------------------------------------------------------------------------------------+-------------------+-------------------------+--------------------------------------------------+----------------+
1 row in set (1 min 14.43 sec)

+--------------------------------+-----------+
| Status                         | Duration  |
+--------------------------------+-----------+
| starting                       |  0.000068 |
| Waiting for query cache lock   |  0.000025 |
| init                           |  0.000025 |
| checking query cache for query |  0.000047 |
| checking permissions           |  0.000026 |
| Opening tables                 |  0.000031 |
| After opening tables           |  0.000025 |
| System lock                    |  0.000025 |
| Table lock                     |  0.000026 |
| Waiting for query cache lock   |  0.000037 |
| init                           |  0.000046 |
| optimizing                     |  0.000032 |
| statistics                     |  0.000225 |
| preparing                      |  0.000042 |
| executing                      |  0.000025 |
| Sorting result                 |  0.000057 |
| Sending data                   | 42.952100 |
| end                            |  0.000070 |
| query end                      |  0.000027 |
| closing tables                 |  0.000025 |
| Unlocking tables               |  0.000028 |
| freeing items                  |  0.000028 |
| updating status                |  0.000039 |
| cleaning up                    |  0.000025 |
+--------------------------------+-----------+
24 rows in set (0.00 sec)

Query OK, 0 rows affected (0.00 sec)
nattfodd
  • 1,790
  • 1
  • 17
  • 35
  • It's very hard to follow your question - can you please post the full DDL for this table, including indexes, and show the query plan? – Neville Kuyt Sep 26 '17 at 09:18
  • Please post the result of `show indexes from users;` – fancyPants Sep 26 '17 at 09:24
  • 1
    added both list of indexes and explain example – nattfodd Sep 26 '17 at 09:34
  • I'm confused by what the problem is in this question. Is it an OCD type of problem or is there an actual problem related to performance or MySQL returning records that you aren't expecting? Also, what made you create so many indexes? MySQL is correct. You're performing a query ordered by `id`. MySQL will use an index here (primary). Your `application_id` index has cardinality of 30 of 21893 (30 unique records across ~22k records) so it won't be used. It's unclear what *wrong index* means in your question. The optimizer is always right as it wants to do the least amount of work possible. – Mjh Sep 26 '17 at 10:05
  • 1
    @Mjh I guess the problem here is that this query (with selected PRIMARY index) takes ~30s. But when use it with USE INDEX(index_users_on_application_id) then it takes 0 seconds. The question is why? Why MySQL chooses that slow index here? – Rubycon Sep 26 '17 at 10:23
  • @IgorKhomenko - shouldn't the select be done with `SQL_NO_CACHE` in that case so we ensure that faster result isn't happening because of already cached result that was there because of initial slower query? If the primary key is used by optimizer, there's no other index that can perform better. This query could be tweaked though, MySQL will inspect every row that exists and then discard everything except 10 records. There are cheap tricks that can help MySQL scan a lower amount of records, and it's not swapping the index. – Mjh Sep 26 '17 at 10:33
  • 1
    @Mjh this is a real world question, difference between "proper" and "incorrect" indexes is like minutes of a query execution time. – nattfodd Sep 26 '17 at 10:34
  • @nattfodd I'm not here to get clogged up in semantics, I'd like to provide help, but don't you agree that for that - we have to establish the problem. The problem is that the query is slow. You tried to see what's going on using explain and by swapping indexes around. But, your method *might* have fallacies which could provide false insights. What I'm trying to assert is what you're trying to achieve (faster execution obviously). I'm not here to advocate approach A vs approach B or one index being "better" than the other. Can you run your query again, but clearing the cache first? – Mjh Sep 26 '17 at 10:37
  • @Mjh `EXPLAIN SELECT SQL_NO_CACHE `users`.* FROM `users` WHERE `users`.`application_id` = 56374 ORDER BY id asc LIMIT 1 OFFSET 0;` returns `index_users_on_application_id_and_external_user_id | 5 | const | 1 | Using where; Using filesort ` , the same as without SQL_NO_CACHE. We've tried to restart MySQL server, and I'm running the query now on a few different servers. No difference :( – nattfodd Sep 26 '17 at 10:52
  • But what's the speed of the query with correct index and incorrect index? Is it still faster with the index that you force versus primary key? You can always resort to a trick to help MySQL go through these records faster by limiting the range, adding `WHERE id > 20000 ORDER BY id DESC LIMIT 1`. The number I put there is arbitrary naturally, that's the part that's tricky - determining which number to put there and how to do it (hardcoded, dynamically etc.) – Mjh Sep 26 '17 at 10:59
  • @Mjh I've updated question. With a primary key as index query takes ~1.5 minutes to proceed. While with `index_users_on_application_id` it takes milliseconds. And since queries like that are quite common, you can image how much it loads the database server. For now, I don't see a better solution rather than specify index manually - `USE INDEX(index_users_on_application_id)`. Ugly as hell, but still viable. – nattfodd Sep 26 '17 at 12:00
  • @nattfodd - of course, the situation is absolutely horrible, but what you ran into might be uncovering a different set of problems. Out of curiosity, can your data set fit to the buffer pool? If it could, scanning and ordering these rows based on PK should be instant, much like with the other index. I'd suggest you use MySQL's profiling (`SET PROFILING = 1; SELECT ... ; SHOW PROFILE FOR QUERY 1; SET PROFILING = 0;`) to see what goes on in detail. What's the value of `innodb_buffer_pool_size`? – Mjh Sep 26 '17 at 12:35
  • @Mjh I've added profiling output to the question, please take a look. – nattfodd Sep 26 '17 at 14:41
  • Please provide `SHOW CREATE TABLE` - need to see datatypes, engine, etc. – Rick James Sep 26 '17 at 21:23
  • 1
    Notice how useless `Profile` is? It usually says the most time is in `Sending data`. (It turns out, that is not necessarily what it is up to.) – Rick James Sep 26 '17 at 21:24
  • Question: Is it faster when it uses the "correct" index? – Rick James Sep 26 '17 at 21:26
  • `SQL_NO_CACHE` does _not_ impact the `EXPLAIN`. But it may impact the timing of the actual query. – Rick James Sep 26 '17 at 21:30
  • What is the value of `innodb_buffer_pool_size`? How much RAM do you have? Please provide `SHOW TABLE STATUS LIKE 'users';` (I'm looking for some caching issues.) – Rick James Sep 26 '17 at 21:32
  • Run `ANALYZE TABLE users;` Then rerun the timing tests. This _may_ be a rare case of the statistics being out of whack. Also, what version are you running? – Rick James Sep 26 '17 at 21:35

3 Answers3

1

You should be able to use index-hints and optimizer hints to suggest correct index usage:

Index hints

Optimizer hints

You could hint directly the table:

tbl_name [[AS] alias] [index_hint_list]

index_hint_list:
    index_hint [index_hint] ...

index_hint:
    USE {INDEX|KEY}
      [FOR {JOIN|ORDER BY|GROUP BY}] ([index_list])
  | IGNORE {INDEX|KEY}
      [FOR {JOIN|ORDER BY|GROUP BY}] (index_list)
  | FORCE {INDEX|KEY}
      [FOR {JOIN|ORDER BY|GROUP BY}] (index_list)

index_list:
    index_name [, index_name] ...

In your case I think the best solution would be to name the attributes by name and not to use star * and to use IGNORE INDEX (unique_index_users_on_application_id_and_login, unique_index_users_on_application_id_and_phone ) FOR ORDER BY directly at the query:

An example based on your code:

SELECT  u.id,
        u.application_id,
        u.login,
        u.phone,
        # ... here to continue
FROM users as u
IGNORE INDEX (unique_index_users_on_application_id_and_login, unique_index_users_on_application_id_and_phone ) FOR ORDER BY
WHERE u.application_id = 1234
ORDER BY u.id
LIMIT 10 OFFSET 0;

First Edit
Due to a comment below I'm adding a trick with invalidating a primary key.

You could also go a way of invalidating the primary key by this trick:

SELECT u.id, 
       u.application_id,
       u.login,
       u.phone,
       #...
FROM users as u
WHERE u.application_id = 1234
ORDER BY u.id+0
tukan
  • 17,050
  • 1
  • 20
  • 48
  • 1
    Shouldn't USE INDEX or even FORCE INDEX (correct_index) be more effective in that case, as you're already specifying the indices? – Milan Velebit Sep 26 '17 at 09:17
  • @MilanVelebit Yes Use & Force would be more effective. I'm adhering to wish `index without explicitly say USE INDEX(index_users_on_application_id)`. – tukan Sep 26 '17 at 09:34
  • Seeing you answered the question, can you please enlighten me about what the actual problem is and what your answer solves for the OP? – Mjh Sep 26 '17 at 10:09
  • @Mjh Since there were no edits before my answer, I had to best guess that nattfold wants to use index against optimizer (answered before edits). My answer enables him to do so, even when it is more expensive as I see now (again no edits before). – tukan Sep 26 '17 at 10:20
  • Yes, but I'm asking if you understand what his problem actually is? I don't. I don't understand what happens if he forces an index or uses another one. What problem is being solved? So far, I have never encountered a case where MySQL has to be explicitly told not to use the primary key in favor of some other index. – Mjh Sep 26 '17 at 10:22
  • @Mjh In rare cases MySQL can use wrong index. If he is right and the optimizer is wrong he should get much faster response. In other scenarios he will get sub-optimal solution. He could also invalidate the use of the primary key (see my edit). – tukan Sep 26 '17 at 10:31
0

It's hard to be certain, but in very many cases, MySQL is good at selecting the correct index.

It may be necessary to update the statistics for the query analyzer - occasionally the query analyzer doesn't have good information on the data, and this can lead to weird behaviour.

However...your index_users_on_application_id only has application_id, but not ID.

I'm guessing that application_id has a fairly low number of distinct values. ColumnID is unique, and has as many values as there are rows in the table. There is no index that includes both the where clause and the order by clause, so MySQL guesses that the most expensive part will be ordering by ID, rather than filtering by application_id.

So, I think MySQL is doing the right thing here. Behind the scenes, it's returning all the rows ordered by ID, and going through that list and giving you the first 10 with the specified application_ID.

The obvious thing to do is to create an index with application_id and ID, and see if MySQL picks that.

Neville Kuyt
  • 29,247
  • 1
  • 37
  • 52
  • So you propose to add another index on application_id+id and it will work? I thought the index merger does the same under the hood – Rubycon Sep 26 '17 at 10:01
  • My guess is that MySQL has decided that there aren't enough distinct values in application_id to make the index merge worthwhile. – Neville Kuyt Sep 26 '17 at 10:14
  • 1
    > The obvious thing to do is to create an index with application_id and ID, and see if MySQL picks that. I did that too, it does not. Which is obvious, since `index_users_on_application_id` is the same (it CONTAINS `id` by definition of secondary index) – nattfodd Sep 26 '17 at 10:28
  • In that case, as @mjh writes, the cardinality of application_id is too low for the index to be useful. – Neville Kuyt Sep 26 '17 at 10:36
  • The OP has implied that the table is InnoDB. If so, then `INDEX(app_id)` has `id` (the PK) _implicitly_ tacked on. As he said, it is effectively `INDEX(app_id, id)`, so it _could_ and _should_ have used that index. – Rick James Sep 26 '17 at 21:43
  • @IgorKhomenko - "Index merge" is almost always slower that having a suitable 'composite' index (which is what we are discussing). Index merge uses two indexes, gathers info from both, ANDs or ORs the two, _then_ reaches into the data. – Rick James Sep 26 '17 at 21:48
  • @NevilleKuyt - Cardinality=30, so I would expect it to use the composite index. – Rick James Sep 26 '17 at 21:50
-1

When you use ORDER BY column_name MySQL scan all data in column given in ORDER BY clause, here in your case id column will get scan. For this database use Table pointer. In InnoDB, it is the value of the PRIMARY KEY and in MyISAM, it is an offset in the .MYD file.

That's why when you use ORDER BY id your query start using only primary key as index.

To use you created index add id column as first column of index. Then it will use index effectively. So your index unique_index_users_on_application_id_and_login on users table should contains columns as follows in same sequence 1- id & 2- application_id.

For more details about MySQL performance of ORDER BY/ LIMIT Go here

Santosh D.
  • 537
  • 6
  • 19
  • I think you are wrong on multiple issues. `ORDER BY id` may make the PK _preferred_, but not required. Adding `id` at the start of an index makes it no better than the PK. When filtering via `=` and ordering by something else, the filtering column _must_ be first in the index. – Rick James Sep 26 '17 at 21:40
  • I know its not better to add 'id' at start of an index but @nattfodd specifically ask how to use indexing effectively without explicitly say USE INDEX(index_users_on_application_id). This can only be achieve by adding 'id' at start of an index. – Santosh D. Sep 27 '17 at 04:30