9

I'm currently taking the course "Performance Evaluation" at university, and we're now doing an assignment where we are testing the CPU usage on a PHP and MySQL-database server. We use httperf to create custom traffic, and vmstat to track the server load. We are running 3000 connections to the PHP-server, for both INSERT and DELETE (run separately).

Numbers show that the DELETE operation is a lot more CPU intensive than INSERT — and I'm just wondering why?

I initially thought INSERT required more CPU usage, as indexes would need to be recreated, data needed to be written to disk, etc. But obviously I'm wrong, and I'm wondering if anyone can tell me the technical reason for this.

Kevin Reid
  • 37,492
  • 13
  • 80
  • 108
Trond
  • 91
  • 1
  • 2
  • Obvious question: Is it *always* so that a DELETE is more resource intensive than an INSERT, or could it be just your specific set-up? If it is always so, who says that? – Tomalak Feb 17 '11 at 20:08

3 Answers3

5

At least with InnoDB (and I hope they have you on this), you have more operations even with no foreign keys. An insert is roughly this:

  1. Insert row
  2. Mark in binary log buffer
  3. Mark commit

Deletions do the following:

  1. Mark row removed (taking the same hit as an insertion -- page is rewritten)
  2. Mark in binary log buffer
  3. Mark committed
  4. Actually go remove the row, (taking the same hit as an insertion -- page is rewritten)
  5. Purge thread tracks deletions in binary log buffer too.

For that, you've got twice the work going on to delete rather than insert. A delete requires those two writes because it must be marked as removed for all versions going forward, but can only be removed when no transactions remain which see it. Because InnoDB only writes full blocks, to the disk, the modification penalty for a block is constant.

Jeff Ferland
  • 17,832
  • 7
  • 46
  • 76
3

DELETE also requires data to be written to disk, plus recalculation of indexes, and in addition, a set of logical comparisons to find the record(s) you are trying to delete in the first place.

Randy
  • 16,480
  • 1
  • 37
  • 55
  • 2
    This argument depends a lot on the constraints in the database, and whether the data being inserted could be affected by those constraints. – Harper Shelby Feb 17 '11 at 20:06
1

Delete requires more logic than you think; how much so depends on the structure of the schema.

In almost all cases, when deleting a record, the server must check for any dependencies upon that record as a foreign key reference. That, in a nutshell, is a query of the system tables looking for table definitions with a foreign key ref to this table, then a select of each of those tables for records referencing the record to be deleted. Right there you've increased the computational time by a couple orders of magnitude, regardless of whether the server does cascading deletes or just throws back an error.

Self-balancing internal data structures would also have to be reorganized, and indexes would have to be updated to remove any now-empty branches of the index trees, but these would have counterparts in the Insert operations.

KeithS
  • 70,210
  • 21
  • 112
  • 164
  • 2
    If there is no foreign key (outgoing) registered for that table, there is no additional work. If there is a foreign key (incoming), there is just as much work for an insert. The foreign keys argument is not strong enough IMHO. – Tomalak Feb 17 '11 at 20:10
  • 1
    I disagree. If there is no foreign key referencing this table elsewhere, you still have to verify that with a table scan of sysobjects (or whatever) before removing the record. Not so when inserting. If the record references a foreign table, an insert IS more expensive, but not by much; you have to find the record with the referenced ID in the referenced table. The referenced table is either statically known or is discovered with a log-time search on sysobjects (et alii) to pull the current table's definition. Finding zero or one records with the referenced ID is also log-time. – KeithS Feb 17 '11 at 20:17
  • 1
    Assuming that you have to check with an *actual separate query* for foreign keys with every delete request (which I refuse to believe, as this is such an obvious thing to optimize for that they probably did it) - you'd *still* have to do the same for every insert. Assuming further that there are no FKs in either direction, then a delete is semantically no different from an insert. An insert even requires actual bytes written to disk, even an index split maybe, but a delete - none of that. *If* deletes are generally slower (FKs aside), I sure would like to know why. – Tomalak Feb 17 '11 at 20:25
  • Why would you have to check for a reference to the record you're inserting? It doesn't exist yet; nothing else could possibly reference it even if there were a ref to the table itself from somewhere. – KeithS Feb 17 '11 at 20:30
  • If your inserted record contains keys from another table (that's what I called "incoming"), then the DB must make sure the key values actually exist in these other tables - or you can't insert the record. It's the *vice versa* operation to checking that no other record references this one when you're about to delete it. Or am I misunderstanding something here? – Tomalak Feb 17 '11 at 20:35
  • 1
    See my answer. Transactions and versioning require more work for deletes than for writes regardless of foreign keys. KeithS seems to misunderstand the application of foreign keys by the database. – Jeff Ferland Feb 17 '11 at 20:39
  • 1
    I think we're crossing wires on what incoming and outgoing mean. I'll restate: deleting a record that may BE a foreign key requires checking all tables on which that record may be the foreign key. A delete of that record will be more expensive than the insert of that record, because a new record inserted into a table cannot be any other record's foreign key yet and so this check wouldn't be performed. Now, if the new record CONTAINS foreign key references, then yes, the insert will be more expensive because you have to check the validity of the references (namely that they exist). – KeithS Feb 17 '11 at 21:17
  • 1
    However, it is less expensive to discover the foreign references of one table than to search all other tables for a reference to one table. So, the insert would add an xlogy operation when inserting a record containing foreign key refs, but the delete would add xy operations to scan the table defs for FK refs, then scan those tables for actual references to the record. – KeithS Feb 17 '11 at 21:20