0

I've got a table which has about 5.5 million records. I need to delete some records from it based on date. My query looks like this:

DELETE FROM Table WHERE [Date] between '2011-10-31 04:30:23' and '2011-11-01 04:30:42'

It's about 9000 rows, but this operation last very long time. How can I speed it up? Date is type of datetime2, table has int primary key clustered. Update and delete triggers are disabled.

Andrew Kozak
  • 1,631
  • 2
  • 22
  • 35
viki
  • 27
  • 6
  • 2
    Is there an index on `[Date]`? – Joe Stefanelli Dec 08 '11 at 19:43
  • Please, consider renaming your column to something other than Date. It's **never** a good idea to use column names that are the same as a reserved word. You'll just end up in a world of pain at some point. It's lazy too - it's not a date, it's a createddate or posteddate or somesuch. Name it appropriately. – Cruachan Dec 08 '11 at 22:06

2 Answers2

2

It's very possible that [Date] is being cast to a string on every row resulting in a sequential scan of the entire table.

You should try casting your parameters to a date instead:

DELETE FROM Table WHERE [Date] between convert(datetime, '2011-10-31 04:30:23') and convert(datetime, '2011-11-01 04:30:42')

Also, make sure there's an index on [Date]

Mike Christensen
  • 88,082
  • 50
  • 208
  • 326
1

Firstly make sure you have an index on date.

If there is an index check the execution plan and make sure it is using it. Notice that it doesn't always follow that using an index is the most efficient method of processing a delete because if you are deleting a large proportion of records (rule of thumb is in excess of 10%) the additional overhead of the index look-up can be greater than a full scan.

With a large table it's also well worth making sure that the statistics are up to date (run sp_updatestats) because if the database has an incorrect understanding of the number of rows in the table it will make inappropriate choices in its execution plan. For example if the statistics are incorrect the database may decide to ignore your index even if it exists because it thinks there are far fewer records in the table than there are. Odd distributions of dates might have similar effects.

I'd probably try dropping the index on date then recreating it again. Indexes are binary trees and to work efficiently they need to be balanced. If your data has accumulated over time the index may well lopsided and queries might take a long time to find the appropriate data. Both this and statistics issue should be handled automatically by your database maintenance job, but it's often overlooked.

Finally you don't say if there are many other indexes on the table. If there are then you might be running into issues with the database having to reorganize indexes as it progresses the delete as well as update the indexes. It's a bit drastic, but one option is to drop all other indexes on the table before running the delete, then create them again afterwards.

Cruachan
  • 15,733
  • 5
  • 59
  • 112
  • 2
    Good answer.The other thing you forgot to mention was foreign key constraints. They have to be checked even if you think know the child data has already been deleted. This can slow down deletes tremendously. Sometimes you need to do deletes in small batches if you have a lot of FKs. – HLGEM Dec 08 '11 at 20:11
  • Thank you, I have created a new index on [Date] and that helped speed things up. There was no index before on the [Date] as I did not have any queries that would involve [Date] (there was only a clustered index on PK). Also thanks for the additional information, it is very useful. – viki Dec 08 '11 at 21:20