1

I have a query:

    SELECT abt.duration, a.id, a.a_id, a.a_tag
    FROM active_begin_times AS abt INNER JOIN sources AS a ON a.id = abt.a_source 
    AND a.u IN (29, 28, 27, 26, 25, 24) 
    WHERE (abt.duration > 86400000) 
    and (abt.begin_timestamp<=1465185600000 and 1465617600000<=abt.end_timestamp
    OR 1465185600000<=abt.begin_timestamp and abt.begin_timestamp<=1465617600000 
    OR 1465185600000<=abt.end_timestamp and abt.end_timestamp<=1465617600000) 
    order by abt.begin_timestamp asc LIMIT 0, 10

The database has about 5 Million entries. This query takes about 30 seconds to run. But if I change by what I order the table to duration, it is calculated within fraction of seconds, like in query below:

SELECT abt.duration, a.id, a.a_id, a.a_tag
FROM active_begin_times AS abt INNER JOIN sources AS a ON a.id = abt.a_source 
AND a.u IN (29, 28, 27, 26, 25, 24) 
WHERE (abt.duration > 86400000) 
    and (abt.begin_timestamp<=1465185600000 and 1465617600000<=abt.end_timestamp
    OR 1465185600000<=abt.begin_timestamp and abt.begin_timestamp<=1465617600000 
    OR 1465185600000<=abt.end_timestamp and abt.end_timestamp<=1465617600000) 
    order by abt.duration asc LIMIT 0, 10

I have indexes on both duration and on begin_timestamp. And when I changed the order by to abt.id, it again took a lot of time to calculate, which also has indexes.

Also, what I noticed is for this particular set of conditions, the query returns 2 rows. But if I change the variables and I get this query to return 20 to 30 odd rows, then the calculations are instantaneous.

Could someone please explain, the above 2 cases? I tried looking in SO but could not either understand the behaviour or was not satisfied with the explanation.

The EXPLAIN for the first query returns:

| id | select_type | table | type   | possible_keys                                            | key             | key_len | ref                   | rows | Extra       |
|----|-------------|-------|--------|----------------------------------------------------------|-----------------|---------|-----------------------|------|-------------|
| 1  | SIMPLE      | abet  | index  | FK2681F9347A6A34B,begin_timestamp,end_timestamp,duration | begin_timestamp | 9       | \N                    | 6094 | Using where |
| 1  | SIMPLE      | a     | eq_ref | PRIMARY,FK722DBCA9F603AF                                 | PRIMARY         | 4       | db.abt.alarm_source   | 1    | Using where |

And from the second query I get:

| id | select_type | table | type   | possible_keys                                            | key      | key_len | ref                   | rows | Extra                              |
|----|-------------|-------|--------|----------------------------------------------------------|----------|---------|-----------------------|------|------------------------------------|
| 1  | SIMPLE      | abet  | range  | FK2681F9347A6A34B,begin_timestamp,end_timestamp,duration | duration | 9       | \N                    | 8597 | Using index condition; Using where |
| 1  | SIMPLE      | a     | eq_ref | PRIMARY,FK722DBCA9F603AF                                 | PRIMARY  | 4       | db.abt.alarm_source   | 1    | Using where                        |

I visible difference I can see is in the Extras columns where in the 2 query is says Using index condition;, bit I am not sure what to make of it.

The output of SHOW CREATE TABLE active_begin_time:

CREATE TABLE active_begin_times (
  id int(11) NOT NULL AUTO_INCREMENT,
  begin_timezone_offset int(11) DEFAULT NULL,
  begin_timezone_suffix varchar(100) DEFAULT NULL,
  begin_timestamp bigint(20) DEFAULT NULL,
  begin_timestamp_date varchar(100) DEFAULT NULL,
  duration bigint(20) DEFAULT NULL,
  end_timezone_offset int(11) DEFAULT NULL,
  end_timezone_suffix varchar(100) DEFAULT NULL,
  end_timestamp bigint(20) DEFAULT NULL,
  end_timestamp_date varchar(100) DEFAULT NULL,
  incomplete int(11) DEFAULT NULL,
  a_source int(11) DEFAULT NULL,
  PRIMARY KEY (id),
  KEY FK2681F9347A6A34B (alarm_source),
  KEY begin_timestamp (begin_timestamp),
  KEY end_timestamp (end_timestamp),
  KEY begin_timezone_offset (begin_timezone_offset),
  KEY end_timezone_offset (end_timezone_offset),
  KEY duration (duration),
  KEY begin_timestamp_date (begin_timestamp_date),
  KEY end_timestamp_date (end_timestamp_date)
) ENGINE=MyISAM AUTO_INCREMENT=6164640 DEFAULT CHARSET=latin1
rd22
  • 1,032
  • 1
  • 18
  • 34

3 Answers3

1

For the first query the index is used to locate and read full table rows in order specified by the index and then the whole where clause is tested against each row. So it's like a full table scan in the order specified by the index.

The second is much faster because it is using the index to limit the rows in the first place (type: range) and it's doing this at the storage engine level (extras: using index condition) without reading the whole table rows to test (abt.duration > 86400000). For those rows that matched the first condition the full table rows is read to test against the rest of the where clause.

Without looking at the explain results, if you compare the two parts of the where clause, it is easily possible to use an index for the first part (abt.duration > 86400000) but not for the second part where you both have abt.begin_timestamp<=1465185600000 and 1465185600000<=abt.begin_timestamp and even a third condition on end_timestamp not the begin_timestamp.

Further reading: Index Condition Pushdown Optimization

Amir Rahimi Farahani
  • 1,580
  • 1
  • 12
  • 14
1
WHERE    duration > 1234 AND ...
ORDER BY duration LIMIT 10

can use INDEX(duration) very effectively:

  1. Locate, in the index, 1234
  2. Grab the next 10 rows that satisfy the rest of the WHERE. Then STOP. Note: If there is other filtering (AND ...), and the 10 rows come soon, this is fast. If, on the other hand, the 10 rows come much later in the table, this is slow.

Your other case is more complex since the begin_timestamp is not in a simple where clause. Instead it must:

  1. Find all the rows matching the WHERE (EXPLAIN estimates thousands).
  2. Sort them according to begin_timestamp (no index is useful)
  3. deliver 10 rows.

In fact, this may be faster than the 'slow' case, above. It depends on the distribution of the data values and other things that the optimizer does not know enough about. So... There will be times when the optimizer picks the "wrong" way to evaluate this kind of query, and end up slower than it should be.

The comment about a 'covering' index was incorrect since it does not include a_source.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Then why the first query is faster when it outputs more rows, the case I mentioned in the question? There is another observation, when the limit is changed to `LIMIT 0, 1000`, the first query just takes 2 seconds to compute. – rd22 Jul 21 '16 at 04:59
  • See my edits. It _may_ be that the optimizer said "Oh, I need 'lots' of rows, so I will use the sort mechanism. And that happens to work faster. – Rick James Jul 21 '16 at 15:26
  • Oh, a simplification: The ORs can be eliminated: `(begin < 14656... AND 14651... < end)` Getting rid of the ORs _may_ speed up the query, also. – Rick James Jul 21 '16 at 15:31
0

because your table has an index on duration (not the other one), it gets the first 10 rows, and is done. In fact, it has a covering index on it, uber fast. It is part of the composite key satisfying all in the query with no hop to to the data page. Granted, there is a join. Two tables involved.

Drew
  • 24,851
  • 10
  • 43
  • 78
  • Hi Drew, I did not get it, what do you mean by "because your table has an index on duration(not the other one)" ? – rd22 Jul 20 '16 at 06:56
  • Ah, if you had a `key \`comp\` (begin_timestamp,end_timestamp,duration)` it would be a covering composite, and you could ditch the current key on just `begin_timestamp` because this new composite would cover that as it is left-most (first). That said, be careful to not over-index especially for an infrequent usage. As indexes slow down inserts and *can* slow down updates. Indexes can slow down updates only when the key part is changed, not the non-key part. Indexes can speed up updates when the keys is used to find the data to update. One should always review their index choices – Drew Jul 20 '16 at 07:02
  • Shadow and I were haggling about it a week ago [here](http://stackoverflow.com/questions/38268050/innodb-composite-indexing-order-and-insert-performance/38268187#comment63984651_38268187) . A bit related, quite a bit not. No two situations are ever the same. So to be clear, only you should decide what you want and know is best for your system and index choices. I try to always throw out caveat statements before I go and say "just create this new index". Otherwise it is cheap and irresponsible of me. – Drew Jul 20 '16 at 07:09