1

I started an ASP.NET project with Entity Framework 4 for my DAL, using SQL Server 2008. In my database, I have a table Users that should have many rows (5.000.000 for example).

Initially I had my Users table designed like this:

Id  uniqueidentifier    
Name    nvarchar(128)   
Password    nvarchar(128)
Email   nvarchar(128)
Role_Id int 
Status_Id   int

I've modified my table, and added a MarkedForDeletion column:

Id  uniqueidentifier    
Name    nvarchar(128)   
Password    nvarchar(128)
Email   nvarchar(128)
Role_Id int 
Status_Id   int 
MarkedForDeletion   bit

Should I delete every entity each time, or use the MarkedForDeletion attribute. This means that I need to update the value and at some moment in time to delete all users with the value set to true with a stored procedure or something similar.

Wouldn't the update of the MarkedForDeletion attribute cost the same as a delete operation?

p.campbell
  • 98,673
  • 67
  • 256
  • 322
Sorin Antohi
  • 6,145
  • 9
  • 45
  • 71
  • 1
    If you need a history, change MarkForDeletion to a date. Many systems I have developed must track information for years, even if "deleted". – Dave Jarvis Sep 10 '10 at 14:00
  • 1
    @Dave: +1. nullable smalldatetime works perfectly for me in this regard. For internal systems, even `deletedByUserName`. Heck, even move to an audit table altogether. :) – p.campbell Sep 10 '10 at 14:09
  • @Dave: in that case MarkForDeletion would represent the date when the item was "deleted" if i understand correctly. But what would be the impact on the db? Meaning an update is faster than a deletion? From my point of view it is still a database access operation. – Sorin Antohi Sep 10 '10 at 14:13
  • @SorinA: not all 'access operations' are created equal! – p.campbell Sep 10 '10 at 14:17
  • @SorinA: Measure the performance. Chances are, unless you have a high-volume system, an update and a delete will be equal in expense. Modern databases have the ability to defragment the table, physically, during periods of low usage. So the performance penalty for a delete is paid for later, if at all. Add on caching and gobs of RAM and you should not have any worries. – Dave Jarvis Sep 10 '10 at 17:11
  • @SorinA: p.campbell has the right idea: use audit tables. – Dave Jarvis Sep 10 '10 at 17:13

2 Answers2

2

This may depend on what you want to do with the information. For instance, you may want to mark a user for deletion but not delte all his child records (say something like forum posts), inthsi case you should markfor deletion or use a delted date field. If you do this, create a view to use for all active users (called ActiveUsers) , then insist that the view beused in any query for login or where you only want to see the active users. That will help prevent query errors from when you forget to exlude the inactive ones. If your system is active, do not make this change without going through and adjusting all queries that need to use the new view.

Another reason to use the second version is to prevent slowdowns when delting large numbers of child records. They no longer need to be deleted if you use a deleted flag. This can help performance becasue less resources are needed. Additionally you can flag records for deltion and then delte them inthe inthe middle of the night (or move to a history table) to keep the main tables smaller but still not affect performance during peak hours.

HLGEM
  • 94,695
  • 15
  • 113
  • 186
2

Depending on the requirements/needs/future needs of your system, consider moving your 'deleted' entities over to a new table. Setup an 'audit' table to hold those that are deleted. Consider the case where someone wants something 'restored'.

To your question on performance: would the update be the same cost as a delete? No. The update would be a much lighter operation, especially if you had an index on the PK (errrr, that's a guid, not an int). The point being that an update to a bit field is much less expensive. A (mass) delete would force a reshuffle of the data. Perhaps that job belongs during a downtime or a low-volume period.

Regarding performance: benchmark it to see what happens! Given your table with 5 million rows, it'd be nice to see how your SQL Server performs, in its current state of indexes, paging, etc, with both scenarios. Make a backup of your database, and restore into a new database. Here you can sandbox as you like. Run & time the scenarios:

  • mass delete vs.
  • update a bit or smalldatetime field vs.
  • move to an audit table

In terms of books, try:

Community
  • 1
  • 1
p.campbell
  • 98,673
  • 67
  • 256
  • 322