14

The MySQL documentation describing the use of index extensions, gives the following table as an example, followed by the query below:

CREATE TABLE t1 (
    i1 INT NOT NULL DEFAULT 0,
    i2 INT NOT NULL DEFAULT 0,
    d DATE DEFAULT NULL,
    PRIMARY KEY (i1, i2),
    INDEX k_d (d)
) ENGINE = InnoDB;

SELECT COUNT(*) FROM t1 WHERE i1 = 3 AND d = '2000-01-01';

InnoDB internally will convert the index k_d to include the primary key at the end. That is, the actual index k_d will be on (d, i1, i2), three columns.

The documentation goes on to explain that (emphasis mine):

The optimizer cannot use the primary key in this case because that comprises columns (i1, i2) and the query does not refer to i2. Instead, the optimizer can use the secondary index k_d on (d), and the execution plan depends on whether the extended index is used.

I am confused by the above statement. First it says that i1 is not enough to use the primary key index of two columns (i1, i2). Then, in the second sentence, it says that the index k_d on (d, i1, i2) can be used, despite that only d and i1 are being used, with i2 absent.

My general understanding of indices in MySQL, and in other flavors of SQL, is that a left portion of an index can be used if a subset of all columns in the index are present, starting from the left.

What is different about a primary key (clustered) index and a non clustered secondary index which allows the latter to use a partial index, but the former cannot?

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • 1
    I think the "cannot" is incorrect. If `k_d` is missing, the primary key will be user: http://sqlfiddle.com/#!9/9f9d6f/1 – CodeCaster Sep 03 '18 at 09:02

2 Answers2

9

The documentation is partially inaccurate on the page you linked to.

Demo, run on MySQL 5.7.21:

mysql [localhost] {msandbox} (test) > CREATE TABLE t1 (
    ->     i1 INT NOT NULL DEFAULT 0,
    ->     i2 INT NOT NULL DEFAULT 0,
    ->     d DATE DEFAULT NULL,
    ->     PRIMARY KEY (i1, i2),
    ->     INDEX k_d (d)
    -> ) ENGINE = InnoDB;

mysql [localhost] {msandbox} (test) > explain SELECT COUNT(*) FROM t1 WHERE i1 = 3 AND d = '2000-01-01';
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key     | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | t1    | NULL       | ref  | PRIMARY,k_d   | PRIMARY | 4       | const |    1 |   100.00 | Using where |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------------+

This query chooses the PRIMARY index, and you can see the key_len is 4, proving it will use only one of the 32-bit INT columns.

When using InnoDB tables, MySQL often prefers to use the PRIMARY index (the clustered index) because it's more efficient than using a secondary index.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • This sounds like good detective work. If you don't mind, I would like to wait a bit before awarding you the booty. – Tim Biegeleisen Sep 06 '18 at 15:53
  • Agreed. Multiple Column Indexes can use the Leftmost Prefix which is demonstrated, oddly, in the same paragraph where it says it can't use `i1` in the primary index... Honestly that whole page could be cleaned up by just dropping the verbiage about the primary index as the page's topic only covers Extended Secondary Indexes. – JNevill Sep 06 '18 at 15:53
  • [Documentation here](https://dev.mysql.com/doc/refman/5.7/en/multiple-column-indexes.html) on how a MultiColumn Index can use the Leftmost Prefix, which contradicts the paragraph OP is stuck on. – JNevill Sep 06 '18 at 15:55
  • 3
    I cut the tech writers some slack. The MySQL manual is over 4000 pages long, and some of the topics are not understood even by the software developers. – Bill Karwin Sep 06 '18 at 15:57
  • 2
    Bill is right. A prefix of either the primary index and the secondary index may be used here. Which index is selected will be a cost-based decision by the query optimizer. I have filed a documenation bug: https://bugs.mysql.com/92347 – Øystein Grøvlen Sep 10 '18 at 06:57
1

In a case like this

WHERE i1 = 3 AND d = '2000-01-01';

I prefer to explicitly say INDEX(d, i1) (or INDEX(i1, d)). My rationale is that I am telling the reader that I thought about the indexes and realized that this was best for the query. And it would be a "covering" index, hence a little faster.

Granted, the INDEX(d) you have should have been equivalent to INDEX(d, i1, i2), which should have been used effectively and efficently. As for why, I would guess an oversight in the Optimizer.

As for the documentation, there are a few poorly phrased places. They welcome documentation critiques at bugs.mysql.com .

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • See Bill Karwin's answer, which shows that in fact the optimizer _does_ use the covering index; there is no oversight AFAIK. Agree with you about MySQL's documentation. It has an open source flavor to it. – Tim Biegeleisen Sep 08 '18 at 12:50
  • 1
    @TimBiegeleisen - "Using index" == 'covering', "Using where" means something else. See that Oystein wrote a doc bug report (Comment above). – Rick James Sep 12 '18 at 12:45