I am speaking specifically about InnoDB...
The PRIMARY KEY
is (as you say) clustered with the data. The entire BTree for that (data + PK) is stored in one set of blocks on disk (not 'main memory'). The 'leaf' nodes contain all the columns.
A secondary key is a separate BTree. Structurally the two BTrees are the same with the exception of what is in the leaf nodes. For a secondary key, a copy of the PRIMARY KEY
is put into to the leaf nodes. Hence, when looking up one row ("point query") using a secondary index, there are two BTree drill-downs - one for the secondary index, and a secondary for the PK.
All blocks are 'cached' in the 'buffer_pool', so they are sometimes in main memory, but are always persisted (sooner or later) on disk. (The transaction log, etc) assure that "later" does not violate the rule that the data always persists.)
Your two pictures are a nice start. However...
- The non-leaf nodes are linked together (as you show), but they are not necessarily adjacent on disk. When inserting a new row (or new index entry), a block could 'split' because of being full. This leads to the blocks being scattered around the disk.
- The leaf nodes are also linked together, but could be scattered.
- For Unclustered, well, suggest you start over, taking into account the PK issue, etc.
What you need to know at a higher level than the pictures are trying to convey:
- A point query drills down a btree
- A secondary lookup has to do 2 drill downs
- "Range" scans -- either of the data, or of an index -- are very efficient because they scan through one block and then move on to the (logically) next block via a bidirectional link between blocks at the bottom level. Hence, it is really a B+Tree, not just a BTree.
- (more on range)
WHERE clustered_key BETWEEN ...
is very efficient
- (more on range)
WHERE secondary_key BETWEEN ...
is very efficient at finding the PK values it needs, but then turns into a bunch of (potentially) random point queries.
- All blocks are pretty much equivalent for caching. But (obviously?) the non-leaf nodes tend to live in the cache because of "least recently used" algorithm. (I am leaving out a lot of details.)
- There can be only one Clustered index. (Unless you are willing to duplicate all the data. This has been done in a couple of Engines other than InnoDB.)
- A block contains as many 'records' (data or index or non-leaf) as is practical at them moment -- anywhere from 1 to hundreds.
- By default, a block is 16KB. (And not easy to change.)
- With innodb_file_per_table=ON, all BTrees for a given table live in a single .ibd 'tablespace'.
- With innodb_file_per_table=OFF, all BTrees for all tables live in a single global 'tablespace' called ibdata1. (Again, over-simplified.)
Now for MyISAM:
- The data for one table lives in a file (.MYD).
- All the indexes (including the PRIMARY KEY) for one table live in a file (.MYI)
- All the indexes are BTrees. (The data is not.)
- The leaf nodes of an index 'point' into the data file.
- Index blocks are 1KB.
- The data file is simply a random-access stream.
(There are a lot more details.)