Why is the AVG row length 4 times as large as expected?

Question

I have this table:

CREATE TABLE `tree_paths` (
  `child_id` int(10) unsigned NOT NULL,
  `parent_id` int(10) unsigned NOT NULL,
  `sponsor_upline` tinyint(1) NOT NULL DEFAULT '0',
  `left_leg` tinyint(1) NOT NULL,
  `binary_level` smallint(5) unsigned NOT NULL DEFAULT '0',
  `sponsor_level` smallint(5) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`child_id`,`parent_id`),
  KEY `tree_paths_parent_id_foreign` (`parent_id`),
  CONSTRAINT `tree_paths_child_id_foreign` FOREIGN KEY (`child_id`) REFERENCES `users` (`id`) ON DELETE CASCADE,
  CONSTRAINT `tree_paths_parent_id_foreign` FOREIGN KEY (`parent_id`) REFERENCES `users` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC;

An int(10) consumes 4 Byte, a smallint(5) 2 Byte, a tinyint(1) 1 Byte. Thus, one row raw data should be 14 Byte.

I have ~16 Million rows inserted. For some reason the AVG row length is 57 Byte instead of 14 Byte.

Is it realistic, that the B-tree for the primary tuple key uses 3 times as much storage as the data itself?

In addition, I have read in https://stackoverflow.com/a/30318725/2311074 that

the primary key is already accounted in the DATA_LENGTH. INDEX_LENGTH means "the size of all secondary indexes".

If that is true, then why is index_length 1/4 of the total size? AS one can see above, I have no secondary keys. Or does it come from the FOREIGN keys?

Remarks:

I found similar questions to mine.

One is Why is InnoDB table size much larger than expected? but it is 10 years old and I don't get it at all. The answer of that question is that a hidden 6-byte column is the reason for the discrepancy there. However, the discrepancy is not 6 but 36. And the user did not use any key, still Index_length is huge. So I have no idea how any of that makes sense, but I guess 10 years ago things were maybe different.

Another answer that I found was at Average row length higher than possible. The point in this answer is that InnoDb just allocated more space then it needs. However, when I check show table status I see that I have 7.3 MB free left. So it seems I can dismiss this.

Note that the number in parentheses following an INT declaration is almost meaningless — Strawberry, Oct 18 '20 at 18:29
@Strawberry its the display width and has no impact on the size, right? — Adam, Oct 18 '20 at 18:31
@scaisEdge yes, but I have no charset field, only intergers, so it should not matter, right? — Adam, Oct 18 '20 at 18:53
If you dump and reload the table into a different table name or database, what is the storage reported then? my guess is your table may have had deletions whose storage may not have been reclaimed? — ysth, Oct 18 '20 at 18:59
@ysth its actually a fresh table that I have generated from another table (moving from native tree to closure table) so I can guarantee that there were 0 deletions in place, only inserts. But still, I will try what happens. However, it may take some time. I am currently creating the same table without foreig keys, this takes about half an hour before I can tryout to redump it. — Adam, Oct 18 '20 at 19:04
@Strawberry - FYI, 8.0.17 deprecates the number in `INT(10)`. (`VARCHARs` continue to use such.) — Rick James, Oct 18 '20 at 19:47
@Strawberry - OK. Then I am talking to everyone else who has not yet encountered it. Oracle seems to be getting more aggressive at stepping from depreciation to removal. I hope people will notice before getting burned. (I have at least one use of `ZEROFILL` in my code; bummer.) — Rick James, Oct 18 '20 at 20:15
(Fair enough. But I think it's 'deprecation', although I don't doubt it detrimentally affects value ;-) ) — Strawberry, Oct 18 '20 at 20:22

Rick James · Answer 1 · 2020-10-18T19:38:47.253

There are many reasons for the avg row size to be high.

It is an approximation. (I have found that it is typically 2x-3x high.) In one extreme case -- one row in the table -- it will claim 16384 bytes per row. That is one InnoDB block. The number of rows in the table is estimated. The disk space used for the rows is exact, but see overheads, below. The avg row size is the quotient of those two.
Overhead per column -- 1 or 2 bytes
Overhead per row -- 20-30 bytes -- for handling transactions, finding rows in a block, etc
Overhead per block -- some number of bytes per 16KB block
Overhead for thrashing in a BTree -- min is about 1/16 of a block, max is about half the block, the average is about 30% after lots of deletes and/or random inserts.
Overhead for pre-allocating chunks of disk space (1MB? 8MB?)
As a table grows from fitting in one block, the layout algorithm shifts, and the percentage of overhead temporarily spikes.
Deleted rows do not return their space to the OS, so the file size stays constant, thereby increasing the apparent row size.
If you do not have an explicit PRIMARY KEY or a UNIQUE key that can be promoted to PK, then there is an inaccessibly 6-byte field (per row) for the PK.
Large TEXT/BLOB and even VARCHAR are stored "off-record". This complicates the computations a lot. And it is dependent on which of the 4 ROW_FORMATs you are using. In some cases there is a 20-byte "pointer" for each such cell.
FOREIGN KEY constraints do not add to the space required, except that they may force the creation of an index.
INDEXes, other than the PRIMARY KEY are no included in the avg_row_length.
The PRIMARY KEY usually involves very little overhead in the data BTree. A simple Rule of Thumb is 1% overhead (on top of the column, itself). This overhead is the non-leaf nodes of the BTree.
While an InnoDB transaction is busy, any modified rows are held onto in the "history list". This leads to more overhead.
(Not totally related). InnoDB's COMPRESSED has issues -- it gives only about 2x compression, unlike typical text compression of 3x. It costs some RAM because of needing to have both the compressed and uncompressed data in the buffer_pool at the same time (for at least some blocks).

SHOW TABLE STATUS and fetching from information_schema.TABLES gives the same data. There are ways to get some insight into the depth of the B+Tree for the data and for each table.

Why is the AVG row length 4 times as large as expected?

1 Answers1

Linked