1

MySQL version 5.6.33

I have two tables (files & details) to inner join

table files, has an index called idx_record(record_id)

table details, has an index called idx_end_org_record(end_at, org_id, record_id)

explain
SELECT
    id as file_id,
    record_id,
    recording_path
FROM `files`
INNER JOIN
(
    SELECT `details`.`record_id`
    FROM `details` 
    WHERE `details`.`org_id` IN (6231, 6232, 6233, 6234) 
    AND (`details`.`end_at` BETWEEN '2017-05-31 16:00:00' AND '2017-06-13 16:00:00') 
    ORDER BY end_at desc
) as b
    on `files`.record_id = b.record_id 
WHERE (file_name IS NOT NULL )
LIMIT 30

Output as below

+----+-------------+--------------+-------+--------------------+--------------------+---------+------------------+---------+--------------------------+
| id | select_type | table        | type  | possible_keys      | key                | key_len | ref              | rows    | Extra                    |
+----+-------------+--------------+-------+--------------------+--------------------+---------+------------------+---------+--------------------------+
|  1 | PRIMARY     | <derived2>   | ALL   | NULL               | NULL               | NULL    | NULL             | 3844632 | Using where              |
|  1 | PRIMARY     | files        | ref   | idx_record         | idx_record         | 5       | b.record_id      |       1 | Using where              |
|  2 | DERIVED     | details      | range | idx_end_org_record | idx_end_org_record | 11      | NULL             | 3844632 | Using where; Using index |
+----+-------------+--------------+-------+--------------------+--------------------+---------+------------------+---------+--------------------------+

As you can see, the <derived2> table is not using any index, making this query take almost 1 second to finish.

Any help is appreciated!


Besides, removing WHERE (file_name IS NOT NULL ) in the query doesn't make any difference.

And in mysql 5.7, this problem doesn't even exist, but currently I'm trying to solve this in 5.6.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
JOJO
  • 13
  • 1
  • 3

1 Answers1

2

I don't see the point of the subquery. In particular, the ORDER BY in the subquery is meaningless. If you write the query as a direct join between the two tables, then the optimizer should be able to take advantage of an index on the join column:

SELECT
id as file_id,
record_id,
recording_path
FROM files f
INNER JOIN details d
    ON f.record_id = d.record_id AND
       d.org_id IN (6231, 6232, 6233, 6234) AND
       d.end_at BETWEEN '2017-05-31 16:00:00' AND '2017-06-13 16:00:00'
WHERE f.file_name IS NOT NULL
ORDER BY <some column>    -- it doesn't make sense most of the time to use LIMIT
LIMIT 30                  -- without ORDER BY

Adding an index on record_id, org_id, and end_at in the details table, and on file_name in the files table, should give you some peformance improvements.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Note that LIMIT without ORDER BY is fairly meaningless – Strawberry Jun 22 '17 at 06:40
  • @Strawberry Then let's add some meaning here :-) – Tim Biegeleisen Jun 22 '17 at 06:42
  • 1. This is a inner join, so the subquery does make sense: it find out all files not only with not-null file_name, but also whose details meet some conditions. – JOJO Jun 22 '17 at 07:18
  • 2. Thanks for your advice, but removing `ORDER BY` doesn't help much, only saved like 100ms. I have tried to change column order in the index, and it turned out that idx_end_org_record(org_id, end_at, record_id) in table details save nearly 300ms. The difference between these two index is that the latter make the `rows` in explain output 10 times less. However, it still not use index in the `` table, using index will make sense, isn't it? – JOJO Jun 22 '17 at 07:27
  • Did you actually use my full answer, or did you just do a few hodgepodge things to your original query? – Tim Biegeleisen Jun 22 '17 at 07:29
  • Sorry, I see your point now. In your query, you move the conditions into the on clause, it seems remove the derived table in optimizer – JOJO Jun 22 '17 at 07:35
  • Ohmy, this is really awesome, putting conditions in ON clause, can avoid generating derived table! (Even subquery itself execute very fast, derived table can be a problem) – JOJO Jun 22 '17 at 07:51
  • Would you mind if I ask a related question that, when to use subquery while I can put these conditions all in ON clause like you taught me to ? – JOJO Jun 22 '17 at 07:56
  • You don't need to even put them in the `ON` clause I believe in this case. You could have also put them into the `WHERE` clause. If you have a substantially different question, then ask a new one. – Tim Biegeleisen Jun 22 '17 at 07:57