0

document headers are in omdok table:

create table omdok ( dokumnr serial primary key, ... );

document rows are in omrid table

CREATE TABLE omrid
(
  id serial NOT NULL,
  reanr serial NOT NULL,
  dokumnr integer NOT NULL,
  CONSTRAINT omrid_pkey PRIMARY KEY (id),
  CONSTRAINT omrid_dokumnr_fkey FOREIGN KEY (dokumnr)
      REFERENCES omdok (dokumnr) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY IMMEDIATE,
  ....
);

Records in omdok which do not have child rows in omrid needs to be deleted

I tried

delete from omdok where dokumnr not in      (select dokumnr from omrid)

Query it is running currently 15 hours and is still running. postgres.exe is using 50% CPU all the time (this is 2 core CPU).

explain delete from omdok where dokumnr not in      (select dokumnr from omrid)

returns:

"Delete  (cost=0.00..21971079433.34 rows=220815 width=6)"
"  ->  Seq Scan on omdok  (cost=0.00..21971079433.34 rows=220815 width=6)"
"        Filter: (NOT (SubPlan 1))"
"        SubPlan 1"
"          ->  Materialize  (cost=0.00..94756.92 rows=1897261 width=4)"
"                ->  Seq Scan on omrid  (cost=0.00..77858.61 rows=1897261 width=4)"
  • How to delete parents without child rows fast?
  • Will this command finish or is postgres hanging ?
  • Currently it is running 15 hours. How many hours it takes to finish ? How to speed up this query ?

Using

PostgreSQL 9.0.1, compiled by Visual C++ build 1500, 64-bit

Windows 2003 x64 server with 4 GB RAM.
Andrus
  • 26,339
  • 60
  • 204
  • 378

2 Answers2

1

You do not qualify your DELETE query in any way so the 220,815 documents are compared against the 1,897,261 document rows. That will take time.

The simplest optimization is using a DISTINCT clause on the document rows, which should bring the rows from omrid down by a factor 8 or so:

DELETE FROM omdok WHERE dokumnr NOT IN (SELECT DISTINCT dokumnr FROM omrid);

A potentially faster solution is to first identify the documents without rows, then delete those rows:

WITH docs0rows AS (
  SELECT dokumnr
  FROM omdok d
  LEFT JOIN (SELECT DISTINCT dokumnr FROM omrid) dr ON dr.dokumnr = d.dokumnr
  WHERE dr.dokumnr IS NULL
)
DELETE FROM omdok d
USING docs0rows zero
WHERE d.dokumnr = zero.dokumnr;

Disclaimer: Test this command before you run it to see which rows will be deleted.

Patrick
  • 29,357
  • 6
  • 62
  • 90
  • it causes error column reference "dokumnr" is ambiguous. Probably omdok.dokumnr should used in delete where clauset. Will using clause work in 9.0 or should it moved to from ? – Andrus Sep 26 '15 at 08:26
  • In the `DELETE` statement you use `USING` instead of `FROM` to specify additional tables to build conditional clauses for filtering. Works in all versions. Answer corrected. – Patrick Sep 26 '15 at 08:35
  • Why aliases d and zero are used in delete ? is it safe to remove them and use table names instead ? – Andrus Sep 26 '15 at 08:37
0

Another alternative is to simply create an index on omrid(dokumnr):

create index idx_omrid_dokumnr on omrid(dokumnr);

This should speed the processing of the not in in your original query.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • psql-genral mailing list recommends to replace it with `delete from omdok where not exists(select 1 from omrid where omdok.dokumnr = omrid.dokumnr);` – Andrus Sep 26 '15 at 14:06
  • @Andrus . . . I do prefer the `not exists` as well, but I think the index will work for either formulation. – Gordon Linoff Sep 26 '15 at 22:19