1

I've partitioned tables in my MySQL 5.1.41 which hold very huge amount of data. Recently, I've deleted a lot of data which caused fragmentation of around 500 GB yet there is a lot of data in the partitions.

To reclaim that space to the OS, I had to de-fragment the partitions. I referred to MySQL documentation, https://dev.mysql.com/doc/refman/5.1/en/partitioning-maintenance.html which confused me with the following statements,

Rebuilding partitions : Rebuilds the partition; this has the same effect as dropping all records stored in the partition, then reinserting them. This can be useful for purposes of defragmentation.

Optimizing partitions : If you have deleted a large number of rows from a partition or if you have made many changes to a partitioned table with variable-length rows (that is, having VARCHAR, BLOB, or TEXT columns), you can use ALTER TABLE ... OPTIMIZE PARTITION to reclaim any unused space and to defragment the partition data file.

I tried both and observed sometimes "rebuild" happens faster and sometimes "optimize". Each partition I run these commands on, has records from millions to sometimes billions. I'm aware of what MySQL does for above each statement.

Do they need to be applied based on number of rows in the partition? If so, on how many rows I can use "optimize" and on how many I should use "rebuild"?

Also, which is better to use?

Smita Ahinave
  • 1,901
  • 7
  • 23
  • 42
Yashwanth Aluru
  • 1,125
  • 6
  • 21
  • 28

1 Answers1

1

MyISAM or InnoDB? (The answer will be different.)

For MyISAM, REBUILD/REORGANIZE/OPTIMIZE will take about the same effort per partition.

For InnoDB, OPTIMIZE PARTITION rebuilds all partitions. So, don't use this if you want to do the partitions one at a time. REORGANIZE PARTITION of the partition into an identical partition definition should act only on the one partition. I recommend that.

It is generally not worth using partitioning unless you have a least a million rows. Also BY RANGE is the only form that has any performance benefits that I have found.

Perhaps the main use of partitioning is with a time-series where you want to delete "old" data. PARTITION BY RANGE with weekly or monthly partitions lets you very efficiently DROP PARTITION rather than DELETE. More in my blog.

(My answer applies to all versions through 5.7, not just your antique 5.1.)

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Hey Rick! Thanks for your answer. But that was not the one I'm looking for. I referred your blog post which concentrated only on RANGE partitioning. My tables are LIST partitioned. Maybe I should've specified that in my question. They are of MyISAM storage engine. I need to know which maintenance strategy should be applied for my scenario. I run ALTER... REBUILD/OPTIMIZE on individual partitions. I need to know whether REBUILD and OPTIMIZE should be done based on number of rows in the partition and which gets completed faster... :) – Yashwanth Aluru Feb 25 '16 at 10:57
  • `LIST` _might_ be useful if it is used somewhat like `RANGE`. (`HASH` is probably always useless`.) As for `REBUILD`/`OPTIMIZE` -- both (on MyISAM do essentially the same thing -- copy the data over and rebuild the indexes. So, they are very likely to take the same amount of time. – Rick James Feb 26 '16 at 22:34