Delete number of rows from table Transact-SQL

Question

In general it's necessary to delete about 110K rows from table Nodes but their id are not declared explicitly but are given from another table.

DECLARE @gh Table(id int);
INSERT INTO @gh (id) SELECT node_id FROM Relations;
DELETE FROM Nodes WHERE id NOT IN (SELECT DISTINCT id FROM @gh)

Selection (SELECT DISTINCT id FROM @gh) contains about 20K rows and server couldn't delete them due to expired timeout. What other ways to do it are there?

Can you add the XML execution plan please (estimated will do, or use `DELETE TOP (10) FROM Nodes ...`) — gbn, Feb 09 '12 at 08:51

gbn · Answer 1 · 2012-02-09T08:50:38.830

3

If you're keeping 20k of 130k, I'd do it a different way

SELECT N.* 
  INTO #NodesTemp 
FROM  
  Relations R JOIN Nodes N ON R.id = N.id;

TRUNCATE TABLE Nodes;

INSERT Nodes SELECT * FROM #NodesTemp;

DROP TABLE #NodesTemp;

Edit:

do you have a DELETE trigger on Nodes?
do you have foreign keys to Nodes that need checked?

edited Feb 09 '12 at 08:50

answered Feb 09 '12 at 07:41

gbn

422,506
82
585
676

The `Nodes` and `Relations` have a `1:n` relationship. Will not this result in rows duplication in the `Nodes` table? – ypercubeᵀᴹ Feb 09 '12 at 08:11
This also won't work if there are foreign keys to the Nodes table. – Serguei Feb 09 '12 at 08:26
@ypercube: possibly, but a DISTINCT will fix that. I was going for a concept, not runnable code – gbn Feb 09 '12 at 08:47
@Serguei: true, but you could then [DELETE in batches with no JOIN](http://stackoverflow.com/a/2126482/27535) or rename tables etc. – gbn Feb 09 '12 at 08:49

Samuel Fraser · Answer 2 · 2012-02-09T06:27:04.883

1

Something like this might help. It will delete all Nodes where there is not a related Relation entry. The outer join should be much faster than using a "not in" clause with the sub query.

delete n
from Nodes n left outer join Relations r on n.id = r.node_id
where r.node_id is null

edited Feb 09 '12 at 06:27

answered Feb 09 '12 at 06:18

Samuel Fraser

56
3

score 1 · Answer 3 · answered Feb 09 '12 at 06:23

1

You can try deleting the nodes in batches instead of all at once by adding a TOP clause to DELETE:

WHILE EXISTS(SELECT * FROM Nodes WHERE id NOT IN (SELECT DISTINCT id FROM @gh))
DELETE TOP (100) Nodes
WHERE id NOT IN (SELECT DISTINCT id FROM @gh)

answered Feb 09 '12 at 06:23

Serguei

2,910
3
24
34

thanks, it seems to work but has appeared another problem with unsufficient memory amount (mashine has 4GB). – RChugunov Feb 09 '12 at 06:34

ypercubeᵀᴹ · Answer 4 · 2012-02-09T08:16:22.887

0

You don't need the DINSTINCT for this. It adds unnecessary overhead to the delete statement.

Second, why are you using a temporary table? Have you tried to combine these 2 statements into one?:

DELETE 
FROM Nodes 
WHERE id NOT IN 
      (SELECT node_id FROM Relations)

or delete the rows in smaller chunks, like @Serguei suggested:

SELECT 1

WHILE @@ROWCOUNT > 0
BEGIN
    DELETE TOP (1000)
    FROM Nodes 
    WHERE id NOT IN 
          (SELECT node_id FROM Relations)
END

edited Feb 09 '12 at 08:16

answered Feb 09 '12 at 06:27

ypercubeᵀᴹ

113,259
19
174
235

all the same timeout exception – RChugunov Feb 09 '12 at 06:42
Do you have indexes on the `Relations.node_id` and `Nodes.id` ? – ypercubeᵀᴹ Feb 09 '12 at 06:45

Delete number of rows from table Transact-SQL

4 Answers4