2

I have a 55Gb Fact table where I have to delete some records which later on can be reverted back. Number of deleted records vary between 10 to 100 thousand.

Currently my delete strategy is based on this: I update the dateKey for the records to be deleted e.g. from positive int 20080122 to negative int -20080122 so that current date filters don't include it.

My thinking here is that instead of moving data out and back in the fact table I make the date out of filter date range and then move it back into filterable date range by use of updates on dateKey.

I would like to hear your views on this delete strategy especially around NCI (non clustered index) behavior. Do you think updating the indexed dateKey is better than moving actual data?

BumbleBee
  • 129
  • 9
  • What happens with queries that don't rely on that column to filter? You would still consider those records as live (not deleted). For this case updating the column is worse than deleting them from the table. The index has to readjust not against the now missing values but also to the negative ones. – EzLo Jan 07 '19 at 12:50
  • DateKey column is always required for queries. – BumbleBee Jan 07 '19 at 12:52
  • Are you using partitions on this fact table? – Dave Brown Jan 07 '19 at 14:36
  • No, there is no partition yet but I am thinking to add on a yearly basis for Fact data. Apart from that I think It would make sense to create a delete partition for negative dateKey entries but couldn't figure it out yet. – BumbleBee Jan 07 '19 at 16:34

1 Answers1

0

Rather than re-purpose the dateKey column, our standard practice is to add a "soft delete" column to the table, either an "is_deleted" bit column or a "deleted_on" datetime column and use that column to filter out "deleted" rows.

This requires more work on your part as all of your existing queries will have to be modified to use this new column, but now your database doesn't have to do the work of re-indexing or deleting/inserting actual data.

Tab Alleman
  • 31,483
  • 7
  • 36
  • 52
  • I will give this a try and lets see if it gives better results. I will put an update on the results soon. – BumbleBee Jan 08 '19 at 10:27
  • Update: I have added a spatial nullable bit column as mostly there will be nulls. I always have date range and this bit column in my query filter. Now the delete queries for 100,000 takes few seconds. This strategy is applicable to my scenario therefore I mark it as correct answer however it may not be suitable for every scenario. – BumbleBee Jan 16 '19 at 09:35