InnoDB B+ tree index - duplicate values

Question

How are duplicate keys handled in InnoDB's implementation of B+ tree for it's indexes.

For example, if there is a table with 1 million rows having a column with cardinality of 10. If we create an index on this column, how will the resulting B+ tree would look like?

Will it just have 10 keys and the value of each key is the list of primary keys which belong to that key (if yes, in what structure? Linked list?) or will it have 1M keys (if yes, then B+ tree would have to be handled differently)?

score 4 · Accepted Answer · answered Jul 09 '16 at 17:53

In some sense, an InnoDB BTree has no duplicates. This is because the columns of the PRIMARY KEY are appended to the columns specified for a secondary key. That leads to a fully-ordered list.

When you lookup via a secondary key (or the initial part of a key), the query will drill down the BTree to find the first row in the index matching what you gave, then scan forward to get any others. To get the rest of the columns, it takes the PRIMARY KEY columns to do a second BTree lookup.

The Optimizer will rarely use an index with "low cardinality". For example, a yes/no or true/false or male/female column should not be indexed. The Optimizer would find it faster to simply scan the table rather than bounce back and forth between the index and (via the PK columns) the main BTree.

The cutoff for when to use the index versus punting is somewhere around 20%, depending on the phase of the moon.

Thanks Rick. That seems to make sense. Do you have any reference - in manual or somewhere else? — Vikk, Jul 11 '16 at 10:04
Alas, no. There are many web pages in the documentation, but they tend to be more precise, and less practical. This is part of my collected knowledge of how indexing works. I try to rephrase it into a 'usable' way. I _have_ repeatedly seen the 20% (10% to 30%) demonstrated. 5.7 redesigned the "cost model" for deciding on the query plan, but still boils down to what I said. — Rick James, Jul 11 '16 at 17:19

score 2 · Answer 2 · edited May 23 '17 at 12:16

Bad index

The case you propose is a bad one for a B+ tree. A cardinality of 10 means only 10 of the 1 million values are unique. Actually it is not only bad for a B+ tree, it is a bad index in general. Based on this index you will on average be left with a subset of approx. 100,000 values, which you either have to look through or use another value to filter further.

B+ tree properties

Concerning the structure of the resulting tree there are some things to keep in mind here:

A node cannot contain arbitrary much data.

Inserts may require splits if the leaf node is full

Occasionally the split of a leaf node necessitates split of the next higher node

In worst case scenarios the split may cascade all the way up to the root node

https://www.percona.com/files/presentations/percona-live/london-2011/PLUK2011-b-

Leafs are linked as a double-linked list.

Leaf nodes are linked together as doubly linked list

[…]

Entire tree may be scanned without visiting the higher nodes at all

https://www.percona.com/files/presentations/percona-live/london-2011/PLUK2011-b-

Expectation

If you insert a lot of data with keys which more or less belong all to the same equivalence class, I would expect a tree, which will not help a lot. The 10 keys might be present solely in the root node, and all data deeper in the tree will just be unsorted (because there is nothing left to sort it).

Due to the fact that the leafs are double-linked lists you are basically left with what I've written in the beginning: You have to traverse a big subset of the values. Concerning the given index this had to be expected and the B+ tree might doing well given the circumstances (a list is ok for just going through all data).

Actually this goes one abstraction deeper: The leafs are double-linked, but there are multiple values in each leaf (data or link to PK). Nevertheless these are in a list too, so if you just traverse everything it makes not much of a difference.

Examining InnoDB space

Please see that you can also investigate what MySQL is really building. There are tools to inspect the built index data structures, see for example

I know it's a bad index. It's just for my understanding of innodb internals. Sorry but your answer is not really clear about how this data will be stored in the tree. I'll try the tools in links. Thanks. — Vikk, Jul 05 '16 at 08:25
Maybe you should just read the two links I provided and also the MySQL manual concerning that. There is (or at least was) a chapter in the manual addressing the MySQL internals, maybe this if of interest for you too (maybe look at the manual for older versions). — Ulrich Thomas Gabor, Jul 05 '16 at 08:54

score 2 · Answer 3 · answered Jul 05 '16 at 14:38

InnoDB stores table in B+ tree index called internally PRIMARY. The key of the index is your primary key fields.

If you define a secondary index there will be additional B+ tree index(in .ibd or ibdata1) where the key is the secondary index fields and value is the primary key.

B+ tree itself doesn't require key to be unique. Uniqueness of PRIMARY and all UNIQUE indexes are enforced at server level.

Here're some slides about how InnoDB organizes indexes and uses them to access the data. http://www.slideshare.net/akuzminsky/efficient-indexes-in-mysql#downloads-panel

InnoDB B+ tree index - duplicate values

3 Answers3

Bad index

B+ tree properties

Expectation

Examining InnoDB space