1

Does a DELETE statement on an InnoDB table in MySQL cause the table to be physically reordered? I have a huge (200 million) row table that I want to delete certain records from. When I delete from it based on a WHERE condition it takes something like 90 minutes but only affects a few hundred thousand rows.

According to this stackoverflow answer, that's because deleting rows causes the table to be physically reordered on the disk: How can I improve DELETE FROM performance on large InnoDB tables?

I used the solution they provided and it worked: Make a new table and insert only the rows you want to keep, then drop the original. BUT! The DBA ran the same straight DELETE on a copy of the database and it worked in 5 minutes. Even with caching I'm surprised.

I can't find any documentation from Oracle or otherwise proving the physical reordering is happening. Does anyone know where I can find documentation stating this is the case? Or some way to prove if this is the cause?

Community
  • 1
  • 1
AS7K
  • 417
  • 4
  • 20
  • Did the DBA run the DELETE on the same database instance? Specific hardware and configuration can make a big difference. Note the reply in the linked question about the buffer size, for example. – user2864740 Aug 01 '16 at 00:05
  • 2
    The claim about physical reordering seems most improbable. I would expect it to (a) remove all the index entries concerned, (b) mark the row as deleted, and (c) release the space to the free list. If databases physically reordered tables on every DELETE statement nothing else would ever happen. – user207421 Aug 01 '16 at 00:28
  • Assuming you establish that that's the cause, what next? – Strawberry Aug 01 '16 at 00:29
  • 1
    No, a DELETE does not physically order the table. If the number of rows deleted is large, this will probably cause fragmentation in many if not all the indexes, including the clustered index (i.e. the table) and subsequent page merges but not a physical rewrite of the whole indexes. See this related part of the documentation: http://dev.mysql.com/doc/refman/5.7/en/index-page-merge-threshold.html – ypercubeᵀᴹ Aug 01 '16 at 00:30
  • If you are deleting a lot, copying the table over can be faster. – Rick James Aug 01 '16 at 15:10

1 Answers1

3

How did you find the rows to delete? Is there an index? If you need to scan the whole table, that takes a lot for 200M rows.

(Assuming InnoDB, ...)

For each row to delete, it does these:

  • Queue up changes to the indexes.
  • Generate undo information, that will need to be cleaned out later.
  • Remove the record from the block where it lives.
  • If a block becomes 'small', coalesce with an adjacent block.

Other comments:

  • If the table is bigger than the buffer_pool, you are likely to incur a lot of I/O.
  • If the rows are randomly arranged, there could be I/O to get to the desired rows.
  • If the key is a UUID, you are very likely to need I/O.
  • If the amount of undo stuff is 'too big', the process slows down be cause of having to work harder to save it.

Another aspect -- if there are SELECTs against the table at the same time, there is some interaction that slows things down.

No, the table is not rebuilt, only little parts may be rearranged.

A possible improvement is to walk through the rows to delete, doing no more than 1000 at a time. COMMIT each change before moving on to the next chunk. More details, and more suggestions.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Yes, InnoDB. Found the rows to delete based on an indexed varchar column. Still slow, though. I think it's plausible that this is a problem with blocks being coalesced with an adjacent block as you say, the related index page being merged with a neighboring index page as suggested by documentation provided by ypercube, the table being bigger than the buffer_pool (it is), or a combination of them. At this point I'm looking at breaking it into smaller tables for this and other reasons. Thanks! – AS7K Aug 01 '16 at 14:47
  • This _might_ help: Sort the list to delete by `PRIMARY KEY` of the main table. – Rick James Aug 01 '16 at 15:08
  • Why break into smaller tables? Instead "paginate" through the to-be-deleted table. Do _not_ use `OFFSET`; instead, "remember where you left off". – Rick James Aug 01 '16 at 15:09
  • If you are getting a list of things to delete once a day; change it to getting a list once an hour?? – Rick James Aug 01 '16 at 15:09
  • I only need to delete occasionally, it's part of a cleanup project not part of the normal operation. Was considering breaking into smaller tables for other reasons anyway so this would just be an added benefit. Thanks though! – AS7K Aug 01 '16 at 15:46
  • @AS7K - Another thing: The _rows_ in InnoDB table is _always_ "ordered" by the `PRIMARY KEY`. This is close to addressing your _title_, but may not be what you really asking. – Rick James Nov 15 '21 at 00:36