Why we use B+ tree for clustered index rather than hashing?

Question

In MySQL InnoDB or lots of other database engines, the primary key is implemented with clustered index. However after searching with secondary index, the engine must look up into clustered index with primary keys provided in secondary index(if there is no covering index).

InnoDB uses B+ tree for its clustered index, it is a structure with O(log n) complexity in searching, so we can summerize the procedure like the following:

Using clusterd index: One pass, Cost O(n).
Using secondary index: Two passes. The first pass cost O(log n) an result in m records. Then the second pass cost O(log n) for each of the m records, so the time complexity will be m*O(log n).

I know when using hasing, the time complexity in seaching can be reduced to O(1), so I am wondering why these database engines prefer using B+ tree rather than hasing techniques(e.g. build a KV store)? Is it because of records are stored on disk rather than in memory?

Meanwhile, I have another question, some other databases, like RocksDB, use KV storage rather than B+ tree. Why they use that?

EDIT

I want to make the question more clearly. I find many tables are designed with auto increment PK, rather than using something with actual meaning, like phone number or IP. So B+ tree's advantage is not fully exploited. For example, B+ tree is good at searching data in range, but I searching a auto increment PK in range is rare in practice.

Hashing is not significantly faster than BTree; why implement Hash without a good reason? (I am reading the minds of the original implementers.) — Rick James, Mar 10 '21 at 20:52
I guess that if you use indices in MySQL on a particular column value (e.g. a `string` or `double`) which is not an `auto increment` value, MySQL will still use B-trees for that. — tonix, Aug 12 '21 at 19:00

O. Jones · Accepted Answer · 2021-03-09T13:53:11.997

4

An important characteristic of B-tree indexes is the so-called range scan. Hash indexes don't have that characteristic. The name of the older MySQL table engine, MyISAM, holds a clue. It stands for Indexed Sequential Access Method. The inherent ordering of BTREE indexes is a major feature.

If we have a table credit with, for example, columns credit_id, user_id, datestamp, and amount, we might use this query.

SELECT SUM(amount) amount FROM credit 
 WHERE datestamp >= CURDATE() - INTERVAL 7 DAY
   AND datestamp <  CURDATE();

With a two-column BTREE index on (datestamp, amount) MySQL can random-access the index O(log n) to the first eligible datestamp, and then sequentially access it O(1) for each successive eligible datestamp. And, because amount is in the index, MySQL can completely satisfy the query from the index. (It's called a covering index). InnoDB indexes implicitly include the primary key column, that is, the key to the clustered index holding all the table's data.

Most large production tables have several covering indexes defined for them, chosen to accelerate queries that take the most time in the particular application.

I'm not claiming that HASH indexes are useless. Far from it. But it's pretty clear that MySQL would work far less efficiently without BTREE indexes.

(The InnoDB engine's code has many optimizations for inserting rows with autoincremented primary keys. If an application uses something else -- like a randomized guid -- for a primary key it can defeat those optimization.)

edited Mar 09 '21 at 13:53

answered Mar 09 '21 at 13:47

O. Jones

103,626
17
118
172

Hi Jones, I know range scan is one advantage for B+tree, however, as I listed in my post, I think this feature is not necessary when handling auto inc PKs, because a range of auto inc PK has no actual meaning. Also covering index is good, if I have an index for `col1/col2/col3`, it make convenient for searching `col1,col2` for example. But building indexes also takes lots of spaces and designing. If no covering index can be used, such as searching with `col2,col3`, we must look back into clustered index with PK, and this involves many rounds of iterating over B+tree. – calvin Mar 09 '21 at 14:04
3

But you have all these problems and worse with hash based indexes. Also with b-trees you at least have a chance that the "ORDER BY" clause matches the b-tree ordering. Hashes are random and any "ORDER BY" will result in a sort. If you have a reaonable idea of the number of rows and know the distribution of the keys and only ever want to access single rows by key -- then hashes are slightly better, why is why they are generally preferred for document store/no sql databases. – James Anderson Mar 09 '21 at 14:27
Thank you for this interesting answer. May I ask you, why RDBMS like MySQL just don't use a directory or subdirectories to store a row, e.g. if a primary key for a `users` table is `123456`, why not just storing all the row in a binary file within the directory, e.g. `db/users/123456/row_binary_data` or e.g. `db/users/12/34/56/123456`? Is it again because of sort order and range scan? Thank you! – tonix Aug 12 '21 at 18:51
What do you think about LSM-Trees? They use a SSTables (Sorted String Tables), which are segments (files) of data sorted by key (thanks to an in-memory memtable, which is essentially an AVL tree emptied and written periodically to disk when a threshold of data is reached - typically a few MB) and use in-memory hash maps to efficiently retrieve data in segments. This kind of indexing of data also allows efficient range queries, as far as I understand. – tonix Aug 12 '21 at 22:53

score 1 · Answer 2 · answered Mar 09 '21 at 12:53

1

An efficient hash requires some fore knowledge of the type, number, distribution of the keys. Plus the complexity of handling collisions (two keys ending up with the same hash value). Space must be preallocated, and, can be too small and quickly run out, or be too big resulting in a massive waste of resources.

b-trees are efficient when small and can grow to any size provided there is the disk space available.

You cite the number of operations but b-trees use simple comparisons which are cheap, hashes use complex algorithms which are expensive. So seven or eight comparisons to find the location of a record in a 64,000 record database, probably uses less cpu than calculating a hash value.

answered Mar 09 '21 at 12:53

James Anderson

27,109
7
50
78

When I looking into many definitions of MySQL tables, they use auto increment primary key, I think it is easy to optimize. Moreover, even they use something with like IP or phone number, I can still add one auto increment primary key – calvin Mar 09 '21 at 12:57
Meanwhile, why some other databases prefer a KV store? – calvin Mar 09 '21 at 13:00
@calvin Could it be maybe because the key is usually a primitive type, e.g. an `int`? Computing the hash for these data types is relatively cheap (the hash of an `int` is the `int` itself, see https://stackoverflow.com/questions/11890805/hashcode-of-an-int#answer-11890843), I'm just reasoning about your question as well and wanted to share my thoughts in this comment. – tonix Aug 12 '21 at 18:58

Why we use B+ tree for clustered index rather than hashing?

2 Answers2