1

In general it's necessary to delete about 110K rows from table Nodes but their id are not declared explicitly but are given from another table.

DECLARE @gh Table(id int);
INSERT INTO @gh (id) SELECT node_id FROM Relations;
DELETE FROM Nodes WHERE id NOT IN (SELECT DISTINCT id FROM @gh)

Selection (SELECT DISTINCT id FROM @gh) contains about 20K rows and server couldn't delete them due to expired timeout. What other ways to do it are there?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
RChugunov
  • 794
  • 2
  • 11
  • 26
  • Can you add the XML execution plan please (estimated will do, or use `DELETE TOP (10) FROM Nodes ...`) – gbn Feb 09 '12 at 08:51

4 Answers4

3

If you're keeping 20k of 130k, I'd do it a different way

SELECT N.* 
  INTO #NodesTemp 
FROM  
  Relations R JOIN Nodes N ON R.id = N.id;

TRUNCATE TABLE Nodes;

INSERT Nodes SELECT * FROM #NodesTemp;

DROP TABLE #NodesTemp;

Edit:

  • do you have a DELETE trigger on Nodes?
  • do you have foreign keys to Nodes that need checked?
gbn
  • 422,506
  • 82
  • 585
  • 676
  • The `Nodes` and `Relations` have a `1:n` relationship. Will not this result in rows duplication in the `Nodes` table? – ypercubeᵀᴹ Feb 09 '12 at 08:11
  • This also won't work if there are foreign keys to the Nodes table. – Serguei Feb 09 '12 at 08:26
  • @ypercube: possibly, but a DISTINCT will fix that. I was going for a concept, not runnable code – gbn Feb 09 '12 at 08:47
  • @Serguei: true, but you could then [DELETE in batches with no JOIN](http://stackoverflow.com/a/2126482/27535) or rename tables etc. – gbn Feb 09 '12 at 08:49
1

Something like this might help. It will delete all Nodes where there is not a related Relation entry. The outer join should be much faster than using a "not in" clause with the sub query.

delete n
from Nodes n left outer join Relations r on n.id = r.node_id
where r.node_id is null
1

You can try deleting the nodes in batches instead of all at once by adding a TOP clause to DELETE:

WHILE EXISTS(SELECT * FROM Nodes WHERE id NOT IN (SELECT DISTINCT id FROM @gh))
DELETE TOP (100) Nodes
WHERE id NOT IN (SELECT DISTINCT id FROM @gh)
Serguei
  • 2,910
  • 3
  • 24
  • 34
  • thanks, it seems to work but has appeared another problem with unsufficient memory amount (mashine has 4GB). – RChugunov Feb 09 '12 at 06:34
0

You don't need the DINSTINCT for this. It adds unnecessary overhead to the delete statement.

Second, why are you using a temporary table? Have you tried to combine these 2 statements into one?:

DELETE 
FROM Nodes 
WHERE id NOT IN 
      (SELECT node_id FROM Relations)

or delete the rows in smaller chunks, like @Serguei suggested:

SELECT 1

WHILE @@ROWCOUNT > 0
BEGIN
    DELETE TOP (1000)
    FROM Nodes 
    WHERE id NOT IN 
          (SELECT node_id FROM Relations)
END
ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235