1

I'm fairly new to the B-Tree concept, I'm currently reading slides for a course that can be found here: http://www-db.deis.unibo.it/courses/TBD/Lezioni/02%20-%20Indices.pdf

I read that B-trees have a "minimum occupancy" of 50%.

What does that mean? Is that a good percentage for minimum occupancy? And is it better to have a higher/lower minimum occupancy?

Thanks

mmarc
  • 11
  • 2
  • Please cite and add the source for context. I guess what you mean is, that all nodes, expect for the root have to be at least "half full"? – sticky bit Jul 11 '18 at 10:30
  • Thanks, I edited the question to include my source. I think that's what is meant.. but what does it mean for a node to be at least half full? – mmarc Jul 11 '18 at 11:26

1 Answers1

0

This answer applies to ENGINE = InnoDB.

For all practical purposes, a given BTree is either "full" or 69% full. This does not address individual blocks.

Individual blocks...

  • When initially loading a BTree in key order, it will be filled to 15/16 full.

  • The "last" block can be nearly empty -- assuming the insert thinks that the tree is being appended to.

  • When filling randomly, there will be block splits that leave two consecutive blocks at about 50% full, each.

  • In the long run (continual churn and/or additions) to a BTree, it settles down to an average of about 69%. (This is a fact about BTrees.)

  • When in the middle of a transaction, extra copies of rows may be placed in blocks; after cleanup, those go away.

  • When two adjacent blocks are less than half full, the code may try to combine the blocks.

  • InnoDB preallocates blocks, so some blocks (at any moment) are completely empty.

Some Database purveyors provide all sorts of tunables for min/max/etc occupancy. MySQL follows the KISS principle; nothing tunable. The effect is that the BTrees are reasonably efficient. Further, note that there are limited choices in indexing (for InnoDB):

  • The PRIMARY KEY is unique and clustered; no options here.
  • Secondary indexes (if any) are non-clustered and have the PRIMARY KEY column(s) in the leaf node. That is, to locate the entire row via a secondary key, there are two BTree drill-downs.

A Rule of Thumb (for InnoDB's 16KB blocks): about 100 items are in each node of a BTree. Corollary: a trillion-row table or index will have about 6 levels in the BTree. (Now, isn't this paragraph simpler than those formulas, etc, in your link?)

InnoDB employs "B+Trees", so sequential scans can walk from one leaf node to the next.

See also Wikipedia for another discussion of BTrees.

Oh, back to the question about 50% -- That is "natural". Think about what a "block split" (aka "leaf split") does -- take one full block and turn it into two adjacent half full blocks. It does not make sense to ask for anything other than 50%. (Yeah, you could split a full block into 3, but that seems wasteful. Or you could split before it is completely full, but then nothing much is gained by that.)

Rick James
  • 135,179
  • 13
  • 127
  • 222