0

I'm working with mysql 5.5.52 on a Debian 8 machine and sometimes we have a slow query (>3s) that usually spends 0.1s. I've started with the explain command to find what is happening.

This is the query and the explain info

explain
SELECT
   `box`.`message_id` ID
 , `messages`.`tipo`
 , `messages`.`text`
 , TIME_TO_SEC(TIMEDIFF(NOW(), `messages`.`date`)) `date`
FROM (`box`)
INNER JOIN `messages` ON `messages`.`id` = `box`.`message_id`
WHERE `box`.`user_id` = '1010231' AND `box`.`deleted` = 0 
    AND `messages`.`deleted` = 0 
    AND `messages`.`date` + INTERVAL 10 MINUTE > NOW()
ORDER BY `messages`.`id` ASC LIMIT 100;


id| select_type| table  | type |  possible_keys   | key   | key_len| ref          | rows | Extra    
 1|SIMPLE      |box     |ref   |user_id,message_id|user_id|       4|const         | 2200 |Using where; Using temporary; Using filesort   
 1|SIMPLE      |messages|eq_ref|PRIMARY           |PRIMARY|       4|box.message_id|    1 |Using where

I know that temporary table and filesort are a bad thing, and I suppose that the problem is that order key doesn't belong to the first table in the query (box) and changing it to box.message_id, the explain info is

id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra

1 SIMPLE box index user_id,message_id message_id 4 443 Using where

1 SIMPLE messages eq_ref PRIMARY PRIMARY 4 box.message_id 1 Using where

It looks better, but I don't understand why it's using the message_id index, and worst, now the query takes 1.5s instead of initial 0.1s

Edit:

Forcing the query to use user_id index, I get the same result (0.1s) as the initial query but without the temporary

explain
SELECT
   `box`.`message_id` ID
 , `messages`.`tipo`
 , `messages`.`text`
 , TIME_TO_SEC(TIMEDIFF(NOW(), `messages`.`date`)) `date`
FROM (`box` use index(user_id) )
INNER JOIN `messages` ON `messages`.`id` = `box`.`message_id`
WHERE `box`.`user_id` = '1010231' AND `box`.`deleted` = 0 
    AND `messages`.`deleted` = 0 
    AND `messages`.`date` + INTERVAL 10 MINUTE > NOW()
ORDER BY `box`.`message_id` ASC LIMIT 100;

id| select_type| table  | type |  possible_keys   | key   | key_len| ref          | rows | Extra    
 1|SIMPLE      |box     |ref   |user_id,message_id|user_id|       4|const         | 2200 |Using where; Using filesort   
 1|SIMPLE      |messages|eq_ref|PRIMARY           |PRIMARY|       4|box.message_id|    1 |Using where

I think that skipping temporary is better solution than the initial query, next step is check combined index as ysth recommends.

Community
  • 1
  • 1
OscarGz
  • 1
  • 3
  • Could you show the query that produced the second explain? – ysth Nov 29 '16 at 10:15
  • Ah, you mean you changed the order by from message.id to box.message_id? – ysth Nov 29 '16 at 10:31
  • `AND \`messages\`.\`date\` + INTERVAL 10 MINUTE > NOW()` -> `AND \`messages\`.\`date\` > NOW() - INTERVAL 10 MINUTE`. Do not do any transformation on columns when it is not necessary. (This may not be an issue in this case, but just an FYI). – Pred Nov 29 '16 at 10:40
  • Please provide `SHOW CREATE TABLE` for the two tables. – Rick James Nov 30 '16 at 20:04
  • any more progress on this? – ysth Dec 02 '16 at 02:06
  • @ysth I'm bussy with a more urgent issue, I'm using the "use index" solution untill I'll can check the combined index one, I want to be sure before changing database. – OscarGz Dec 02 '16 at 10:15
  • ah, so maybe the slow queries were when it decided to use the message_id index and the fast ones when it decided to use the user_id index? I don't know why the difference in using temporary or not; that is very strange. – ysth Dec 02 '16 at 17:27
  • Try this: Add to `messages`: `INDEX(deleted, date)`. – Rick James Dec 05 '16 at 19:44

2 Answers2

0

it is not a good idea to calculate on fieldvalues to compare. then you get a FULL TABLE SCAN. MySQL must do it for each ROW before it can check the condition. Its better to do it on the constant piece of condition. Then MySQL can use a Index (if there one on this field)

change from

AND messages.date + INTERVAL 10 MINUTE > NOW() 

to

AND messages.date  > NOW() - INTERVAL 10 MINUTE
Bernd Buffen
  • 14,525
  • 2
  • 24
  • 39
  • But he isn't getting a full table scan; date is not a index here. – ysth Nov 29 '16 at 10:12
  • I know that calculations don't work with index, but that field (date) is not indexed and in the modified query, optimizer is using index, but for me the wrong one (message_id) – OscarGz Nov 29 '16 at 10:12
  • Well, add `INDEX(user_id, deleted, date)` -- then the tmp+filesort + table scan will go away. – Rick James Nov 29 '16 at 19:16
  • @RickJames user_id is in a different table. If it could be added to messages, that would probably be an ideal index, but I'm guessing it can't. – ysth Nov 30 '16 at 07:53
  • Messages table can not contain user_id because some messages are for groups or all users, but INDEX(user_id, deleted, date) it's ok for box table, but I prefer to avoid using many specific index if it's possible. As ysth said "that might also end up worse. Depends on your exact data." – OscarGz Dec 02 '16 at 10:30
0

Temporary and file sort are not bad here; they are needed because using the best index (user_id) doesn't naturally produce records sorted in the order you ask for.

It's possible you might do better having a combined user_id,message_id index, but that might also end up worse. Depends on your exact data.

It isn't clear to me if you are seeing longer queries for certain user id's or the same user id sometimes taking much longer.

Update: it seems likely that having a combined index and changing order by to box.user_id,box.message_id will solve your problem, at least for users that don't have a large number of deleted messages.

ysth
  • 96,171
  • 6
  • 121
  • 214
  • Updated answer. – ysth Nov 29 '16 at 10:37
  • Slow queries are "random", if I execute the same query multiple times, usually works ok, but sometimes I have a slow query. The query is not the problem, but I've started trying to optimize it. – OscarGz Nov 29 '16 at 11:05
  • I don't understand why it's using message_id as index only when I use it to order by, with message.id order by optimizer uses user_id, it's like order by index takes precedence over where index. – OscarGz Nov 29 '16 at 11:08
  • It has to guess whether it is going to get your 100 results fastest by reading through box in order by message id or reading through just one users messages. If you remove the date and deleted conditions, I bet it is right. – ysth Nov 29 '16 at 17:37