The execution plans below seem disappointing and suboptimal, even though the queries are straightforward.
I'm using MySQL 5.7. Here is the fiddle (although it only offers 5.6).
CREATE TABLE `event` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`name` VARCHAR(63) CHARSET ASCII COLLATE ASCII_BIN NOT NULL,
`is_sequenced` TINYINT(3) UNSIGNED NOT NULL,
`sequence_number` BIGINT(20) UNSIGNED DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `Name-SequenceNumber` (`name`,`sequence_number`),
KEY `Name-IsSequenced` (`name`,`is_sequenced`,`id`)
) ENGINE=INNODB
;
INSERT INTO `event`
(id, `name`, is_sequenced, sequence_number)
VALUES
(NULL, 'OrderCreated', 0, NULL),
(NULL, 'OrderCreated', 0, NULL),
(NULL, 'OrderCreated', 0, NULL),
(NULL, 'OrderCreated', 0, NULL),
(NULL, 'OrderCreated', 0, NULL),
(NULL, 'OrderCreated', 0, NULL),
(NULL, 'OrderCreated', 0, NULL),
(NULL, 'OrderCreated', 0, NULL),
(NULL, 'OrderCreated', 0, NULL)
;
We will be using the Name-IsSequenced
secondary index. Let's try the following EXPLAIN
. (The queries are in the Fiddle. Open "View execution plan" to see their EXPLAIN
result.)
EXPLAIN
SELECT * -- This part needs the PK
FROM `event` e
WHERE e.name = 'OrderCreated'
AND e.is_sequenced = 0
AND e.id <= 3
;
So far, so good. Using index condition
makes sense: the entire condition can be resolved on the expected index Name-IsSequenced
, and then the PK is needed to get the remaining data for SELECT *
.
We should be able to improve that to Using index
if we only select something that is part of our secondary index, right? (Note that the PK is always part of any secondary index, but we can even ensure this by including id
at the end of our secondary index. The result is the same, as it should be.)
EXPLAIN
SELECT id
FROM `event` e
WHERE e.name = 'OrderCreated'
AND e.is_sequenced = 0
AND e.id <= 3
;
Now, the result is Using where; Using index
. Wait, that's... worse?! We made it do less work, and the plan indicates that it is working harder.
Using index
should be achievable. Find the range where name=OrderCreated
, then inside it find the subrange where is_sequenced=0
, then inside that find the subrange where id<=3
.
Curiously, I have other experiments (with some more data) where I can get Using index
by changing id<=3
to id=3
(combined with FORCE INDEX
to prevent it from prefering to the PK). I see no reason for the difference. (If we try this with the Fiddle, it stays the same - perhaps because of the small data set.)
Can anybody explain why the execution plan does not indicate the expected efficient use of the secondary index? Is there are a way to straighten it out?