1

I want to delete rows based on the combination of two columns. My table looks like this (simplified):

[ID], [Sub_ID], [Value] 

Values could for example be:

1234 - 1 - 100
1234 - 2 - 50
5678 - 1 - 90
4321 - 1 - 75
4321 - 2 - 75

I want to delete all records except for some specific combinations of [ID] and [Sub_ID]. Example: delete all combinations except for combination 1234-2 and 4321-2.

*EDIT: The 2 values are an example, in reality I need to maintain well over 10,000 combinations of ID-Sub_ID.

To do this I combine the two ID columns with a cast and delete everything that does not match this combination.

Delete 
from table
    where   
    CAST(ID as varchar(4))+'-'+Cast(Sub_ID as varchar(1)) not in
    ('1234-2', '4321-2')

This works, but it is slow and probably very inefficient. It already takes several minutes to execute this query and I will expand the selection each month, probably making it even worse each time, Does anyone know how I can make this more efficient?

Many thanks, Steven

user3402229
  • 13
  • 1
  • 4

4 Answers4

0

You could use a CTE which selects all records that should not be deleted, then you can join it with the original table:

WITH Keep AS
(
    SELECT ID=1234, Sub_ID=2
      UNION ALL
    SELECT ID=4321, Sub_ID=2
)    
SELECT t.* 
FROM Table1 t INNER JOIN Keep k
  ON t.ID = k.ID AND t.Sub_ID = k.Sub_ID 

This shows what you'll keep: demo

If you want to delete the other you can use NOT EXISTS:

WITH Keep AS
(
    SELECT ID=1234, Sub_ID=2
      UNION ALL
    SELECT ID=4321, Sub_ID=2
)   
DELETE t FROM Table1 t WHERE NOT EXISTS
(
   SELECT 1 FROM Keep k 
   WHERE k.ID = t.ID AND k.Sub_ID = t.Sub_ID
)

Demo

This approach should be efficient and readable.

Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • Thanks Tim! This is going in the right direction, the only problem is that the CTE would have to contain far more records than the 2 I mentioned (probably should have said that in the question..) It will be well over 10,000 ID-Sub_ID combinations at some point. I receive Excel files with the two ID's, which I combined in Excel to add to the 'not in' statement. Any way to easily enter that into the CTE? – user3402229 Mar 10 '14 at 15:37
  • @user3402229: you could fill a temp-table and select that in the CTE. – Tim Schmelter Mar 10 '14 at 15:44
  • Great, I got it to work like that. Process time went down from 8 minutes 53 seconds to 35 seconds! Thanks a lot everyone for helping out – user3402229 Mar 10 '14 at 16:46
0

Performing a transform on the left side of the where clause will usually result in poor performance. I would suggest staging the rows you want to delete in a temp table and then simply join to it in your delete. When you need to expand the delete criteria just add an insert into this staging table (@delete):

declare @t table (ID int, Sub_ID int, Value int)
insert into @t 
    select 1234, 1, 100 union all
    select 1234, 2, 50 union all
    select 5678, 1, 90 union all
    select 4321, 1, 75 union all
    select 4321, 2, 75;


--stage the combination IDs you want to delete:
declare @delete table (ID int, Sub_ID int);
insert into @delete
    select 1234, 2 union all
    select 4321, 2;

delete  t
from    @t t
join    @delete d on 
        t.ID = d.ID and t.Sub_ID = d.Sub_ID;


select * from @t;
nathan_jr
  • 9,092
  • 3
  • 40
  • 55
0

Do not combine your columns and convert them to Strings, this is going to be terribly slow as you already noticed. Filter on the columns itself with something like the following:

DELETE FROM table WHERE {
  NOT((ID = x0 AND SUB_ID = y0) OR (ID = x1 AND SUB_ID = y1))

}
pintxo
  • 2,085
  • 14
  • 27
0

Possible solitions to the issue:

  1. It looks like ID and SubID are numbers, so don't compare strings, compare numbers instead.

  2. Create an index that includes ID and SubID

  3. DELETE is a heavy statement, that requires a lot of db-writes and can leave indexes fragmented. If you table is big enough and you know that you are deleting at least one-third of the entire table, use a temporary table and execute INSERT ... SELECT statement to insert REMAINING records to a temporary table. Remove all records from your main table and switch table names after that. You usually need to do it in one transaction. Test table performance before implementing. Examples: Bulk DELETE on SQL Server 2008 (Is there anything like Bulk Copy (bcp) for delete data?)

Community
  • 1
  • 1
Slava
  • 1,065
  • 5
  • 11