7

I am hoping someone can explain why two hours of data purging purportedly results in a mere 32 KB reduction in data usage on my mysql instance. Here are my details:

I have a mysql database (running on Amazon RDS) from which I am attempting to purge data. I am doing this in order to avoid running out of storage space, because Amazon caps you at 1TB, and if we take no action we will eventually hit this limit.

I am using this command to calculate the size of my tables and indices:

select * from information_schema.tables;

There are two InnoDB tables in particular that consume the majority of my storage. I have a process that is iterating through my largest table deleting records. At time t = 0 I ran the above SQL query and got the following results for data length and index data length:

  • Data length: 56431116288
  • Index data length: 74233151488

Two hours later, after continuously running my database cleaner process, I ran the above SQL statement and got following:

  • Data length: 56431083520
  • Index data length: 74126147584

That basically means I shaved off 32 KB of table data and 102 MB of index data.

The reduction in the index makes sense. The reduction in the table data is insanely small. It is not possible that other data was inserted during this time, because I am running this test on a backup copy of my database (one of the nice things about RDS is you can get a complete replicate of your DB up and running to run experiments on, such as this one). I also confirmed that the AUTO_INCREMENT value was identical both times.

Can someone explain why data length has not moved much at all? Is data length just a really quick and dirty approximation? Is there some other compaction step mysql will eventually do? Or have I totally mis-interpreted the use of these fields?

Thank you!

Update

I may have figured it out -- at time t=0

  • DATA_FREE = 77594624

Four hours laters,

  • DATA_FREE = 256901120

That means I have increased DATA_FREE by appx 171MB.

Does this therefore mean that if I insert another 171MB, it will come out of the DATA_FREE pool, and so my Data length will not increase?

Put another way, let's say I start with a new InnoDB table and insert 20 GB of data (assuming that 20 GB incorporates all the excess InnoDB stuff, I realize data stored in InnoDB is larger than MyISAM), then I delete all data, then I insert 10 GB of data. When I run select * from information_schema.tables, I should see Data length of 10 GB and Data free of 10 GB, right? I should not expect to see Data length of 30 GB/Data free of 0 GB nor should I expect to see data length of 10 GB/data free of 10 GB?

Update 2

This post on Stack Overflow would also seem to confirm that my analysis.

Community
  • 1
  • 1
esilver
  • 27,713
  • 23
  • 122
  • 168

1 Answers1

4

The "data length" of a table includes any free space that may exist within the table. You will probably have to OPTIMIZE the table to defragment the table, freeing up that space. Note that this may lock the table for some time while it does its thing.

Using the InnoDB storage engine (CREATE TABLE ( ... ) ENGINE=InnoDB;) will make table optimization largely unnecessary, as well as making the database generally faster. If you aren't using it already, you should probably start. :)

  • Thanks for the info -- the table is an InnoDB table (I just double-checked), so presumably this isn't the cause? – esilver Nov 04 '11 at 06:03
  • I might have found an answer/have a theory; since it sounds like you're up on this stuff could you LMK if you think my analysis is correct? Thanks! – esilver Nov 04 '11 at 07:01
  • Yes, your analysis appears to be sound. –  Nov 04 '11 at 07:12
  • 2
    Data length likely does not include free disk space. On a table that I recently performed major maintenance on, I have the following values: 1.35GB data length, 1.3GB index length, 6.1GB data free. Also, this was an InnoDB table. Using InnoDB does not "make table optimization largely unnecessary" – Matthew Kolb Nov 06 '14 at 16:28