0

I am trying to estimate the real disk-usage required space for each record of my table in MySQL RDBMS.

The table has a structure like this:

  • ID INT 4 byte;
  • VARCHAR(34) 34 byte;
  • INT 4 byte;
  • INT(5) 4 byte;
  • INT 4 byte;
  • INT 4 byte which is also a FOREIGN KEY;

So there are 5 INT fields and a VARCHAR of a maximum of 34 chars (i.e. 34 bytes).

I have 2 questions:

1) The total should be 54 bytes per record (with variable VARCHAR, of course) am I right when I am saying that, or there are also some over-head bytes which I should consider when estimating the disk-usage space?

2) I have also used INT(5) instead of CHAR(5) cause I need to store only exactly 5 digits in that field (I am going to do that by application, with regExp and string length, cause I know that INT(5) could be more than an int with 5 digits). But could this be considered such as an optimization by the disk-usage space cause I am using an INT (4 bytes) instead of a CHAR(5) which is 5 bytes, i.e. 1 more byte per record?

Thanks for the attention!

tonix
  • 6,671
  • 13
  • 75
  • 136
  • Check this answer - there are some calculations http://dba.stackexchange.com/questions/59909/how-to-calculate-the-space-i-need-for-the-indexes/59910#59910 – akuzminsky Mar 06 '14 at 21:10
  • To calculate exact size give additional info: 1) Is ID primary key? 2) VARCHAR field - what encoding? 3) Are the fields NULL-able? – akuzminsky Mar 06 '14 at 21:15
  • Yes ID is the PRIMARY KEY AUTO_INCREMENT, the encoding of the VARCHAR field is utf-8 encoded (the CHARSET of the table is set to UTF8), and all the fields are NOT NULL. the last INT field is also a FOREIGN KEY as I said before. So how much overhead am I missing? – tonix Mar 07 '14 at 07:44

1 Answers1

0

One record itself will use

1 byte in offsets

0 bytes in NULLable bits

5 bytes in "extra bytes" header

4 bytes ID

6 bytes transaction id

7 bytes rollback pointer

0-3*34 bytes in VARCHAR(34) (one character may take up to 3 bytes because of UTF8)

4*4 bytes in other integers

Each distinct value of FK will lead to one record in a secondary index. it will use

5 bytes in "extra bytes" header

4 bytes INT for FK value

4 bytes INT for Primary key

Other overhead is page level: 120 bytes per page (16k) in headers page fill factor 15/16 - i.e. one page may contain 15k in records.

And the last - add space used by non-leaf pages, which should be small anyway

So, answer to question - 1) yes there will be some overhead that you can calculate using information above.

2) CHAR(5) in UTF8 will add a byte for its length, so INT looks reasoaanle to use

akuzminsky
  • 2,190
  • 15
  • 21
  • If you are using `utf8mb4_unicode_ci` , you can have also 4 byte per character. But I think OP uses `utf8_general_ci` – Daniel W. Mar 07 '14 at 14:23