Your example might be 20MB.
"In memory" really means "in the InnoDB buffer_pool", whose size is controlled by innodb_buffer_pool_size
, which should be set to about 70% of available RAM.
If your query hits the disk instead of finding everything cached in the buffer_pool, it will run (this is just a Rule of Thumb) 10 times as slow.
What you are saying on "clustered index" is misleading. Let me turn things around...
- InnoDB really needs a
PRIMARY KEY
.
- A PK is (by definition in MySQL)
UNIQUE
.
- There can be only one PK for a table.
- The PK can be a "natural" key composed of one (or more) columns that 'naturally' work.
- If you don't have a "natural" choice, then use
id INT UNSIGNED NOT NULL AUTO_INCREMENT
.
- The PK and the data are stored in the same BTree. (Actually a B+Tree.) This leads to "the PK is clustered with the data".
The real question is not whether something is clustered, but whether it is cached in RAM. (Remember the 10x RoT.)
- If the table is small, it will stay in cache (once all its blocks are touched), hence avoid disk hits.
- If some subset of a huge table is "hot", it will tend to stay in cache.
- If you must access a huge table "randomly", you will suffer a slowdown due to lots of disk hits. (This happens when using UUIDs as
PRIMARY KEY
or other type of INDEX
.)
How the database decide when to process the data in-memory, and when not?
That's 'wrong', too. All processing is in memory. On a block-by-block basis, pieces of the tables and indexes are moved into / out of the buffer_pool. A block (in InnoDB) is 16KB. And the buffer_pool is a "cache" of such blocks.
SELECT * FROM Employee;
is simple, but costly. It operates thus:
- "Open" table
Employee
(if not already open -- a different 'cache' handles this).
- Go to the start of the table. This involves drilling down the left side of the PK's BTree to the first leaf node (block). And fetch it into the buffer_pool if not already cached.
- Read a row -- this will be in that leaf node.
- Read next row -- this is probably in the same block. If not, get the 'next' block (read from disk if necessary).
- Repeat step 4 until finished with the table.
Things get more interesting if you have a WHERE
clause. And then it depends on whether the PK or some other INDEX
is involved.
Etc, etc.