0

Not sure what the best way to handle this is. For my particular situation I have numerous tables where I want to delete any rows with a timestamp that is greater than 3 months ago... aka only keep records for the last 3 months.

Very simply it would be something like so :

//set binding cutoff timestamp
$binding = array(
    'cutoff_time' => strtotime('-3 months')
);

//##run through all the logs and delete anything before the cutoff time

//app
$stmt = $db->prepare("
    DELETE
    FROM app_logs           
    WHERE app_logs.timestamp < :cutoff_time
");
$stmt->execute($binding);

//more tables after this

Every table I am going to be deleting from has a timestamp column which is indexed. I am concerned about down the road when the number of rows to delete is large. What would be the best practice to limit the chunks in a loop? All I can think of is doing an initial select to find if there are any rows which need to be deleted then run the delete if there are... repeat until the initial doesn't find any results. This adds in an additional count query for each iteration of the loop.

What is the standard/recommended practice here?

EDIT:

quick writeup of what I was thinking

//set binding cutoff timestamp
$binding = array(
    'cutoff_time' => strtotime('-3 months')
);

//set limit value
$binding2 = array(
    'limit' => 1000
);

//##run through all the logs and delete anything before the cutoff time

//get the total count
$stmt = $db->prepare("
    SELECT
        COUNT(*)
    FROM app_logs
    WHERE app_logs.timestamp < :cutoff_time
");
$stmt->execute($binding);

//get total results count from above
$found_count = $stmt->fetch(PDO::FETCH_COLUMN, 0);

// loop deletes
$stmt = $db->prepare("
    DELETE
    FROM app_logs           
    WHERE app_logs.timestamp < :cutoff_time
    LIMIT :limit
");

while($found_count > 0)
{
    $stmt->execute( array_merge($binding, $binding2) );

    $found_count = $found_count - $binding2['limit'];
}
user756659
  • 3,372
  • 13
  • 55
  • 110
  • 1
    Best practice: don't optimise until you need to. Your method may work perfectly well under all circumstances. Address the problem when it becomes apparent that you have one. –  Feb 02 '15 at 22:49
  • Doing a `select` to determine if you need to delete will make things worse performance-wise. `Delete` already does its own select to see if it needs to delete. – developerwjk Feb 02 '15 at 22:55
  • @developerwjk - exactly, but there is no other way to loop with a limited delete query without doing the select. – user756659 Feb 02 '15 at 23:08
  • @Hobo Sapiens - I hear what you are saying, but in this case I would rather knock it out to begin with rather than have the problem actually occur. Being that I will run this once daily, at most, the few changes to handle these in limited fashion, if even need be, would hardly be a performance hit. – user756659 Feb 02 '15 at 23:13

1 Answers1

0

It depends on you table size and its workload so you can try some iterations:

  1. Just delete everything that is older than 3 month. Take a look if it's timing is good enough. Is there performance degradation or table locks? How your app handles period of data deletion?

  2. It case everything is bad consider to delete with 10k limit or so on. Check it as above. Add proper indexes

  3. Even it's still bad, consider selecting PK before delete and than delete on PK with 10k limit and pauses between queries.

  4. Still bad? Add new column "to delete", and perform operation on it with all requirements above.

There is a lot of tricks on for rotating tables. Try something and you will face your needs

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459