postgresql 9.5.7: INSERT WHERE NOT IN (or NOT EXISTS) not working with bulk-insert of multiple lines at once

Question

I want to copy rows from one table t2 to another t1, while excluding rows with values already existing in t1. The usual approach of 'NOT IN' works fine but only as long there are not multiple occurences of the same value in the source table t2.

Now, assuming I have two tables with the schema:

CREATE TABLE t1 ( id INTEGER );
CREATE TABLE t2 ( id INTEGER );

then insert data into them like:

INSERT INTO t1 VALUES (1);
INSERT INTO t2 VALUES (1);
INSERT INTO t2 VALUES (2);

Now, I try to insert all data from t2 into t1 but exclude pre-existing in t1:

INSERT INTO t1 (id) SELECT t2.id FROM t2 
    WHERE t2.id NOT IN ( SELECT t1.id FROM t1 WHERE t1.id = t2.id );

it works flawlessly; the row in t2 with the value of '1' did not get insert a second time into t1:

SELECT * FROM t1;

 id 
----
  1
  2
(2 rows)

But when there are multiple occurences of the same value in t2 it doesn't check if they exist in t1 for each individual insert, but for the whole transaction as it seems. Let's continue with my example by:

DELETE FROM t1;

INSERT INTO t2 VALUES (2);

SELECT * FROM t2;

 id 
----
  1
  2
  2
(3 rows)

INSERT INTO t1 (id) SELECT t2.id FROM t2 
    WHERE t2.id NOT IN ( SELECT t1.id FROM t1 WHERE t1.id = t2.id );

SELECT * FROM t1;

 id 
----
  1
  2
  2
(3 rows)

The same result is achieved with WHERE NOT EXISTS as well.

Has anyone an idea of how to check for existing values in t1 on an individual row-level to prevent multiple occurences?

I could as well use ON CONFLICT DO ... but I rather not want to since the idea is to split the data coming from t2 into a "clean" t1 and a "dirty" t1_faulty where all the rows are collected which do not fit some given criteria (one of which the uniqueness of id for which I am asking this question).

Could you also show desired results you want to get after the instert for both examples ? For now you have shown us two queries that don't work. I really don't understad the first example - t1 has `1`, you are going to insert `1` and `2` to t1 *excluding rows with values already existing in t1*. For me this query works perfectly - it inserts only `2` and skips `1` — krokodilko, Aug 01 '17 at 19:39
@Abelisto: Yeah, that seems to be the issue. I could solve it like essaferaaimeu has suggested. — steffres, Aug 03 '17 at 18:36
@krokodilko: The desired result would be in t1 : '1, 2' but not two times 2 : '1, 2, 2' like in the last example of my code block. But I could solve it thanks to essaferaaimeu's suggestion. — steffres, Aug 03 '17 at 18:38

score 0 · Accepted Answer · answered Aug 01 '17 at 19:45

I think you could simply filter the records you want from the source table (t2).

you might use distinct on

INSERT INTO t1 (id) SELECT distinct on (t2.id) t2.id FROM t2 
    WHERE t2.id NOT IN ( SELECT t1.id FROM t1 WHERE t1.id = t2.id );

or group by

INSERT INTO t1 (id) SELECT t2.id FROM t2 
    WHERE t2.id NOT IN ( SELECT t1.id FROM t1 WHERE t1.id = t2.id ) group by t2.id;

or, if you want only the records that are already unique on t2, add a having count = 1

INSERT INTO t1 (id) SELECT t2.id FROM t2 
    WHERE t2.id NOT IN ( SELECT t1.id FROM t1 WHERE t1.id = t2.id )
    group by t2.id
having count(t2.id) = 1

Thanks for your reply, essaferaaimeu. It does work like oyu have suggested. Thanks for that. :) Cheers! — steffres, Aug 03 '17 at 18:35

postgresql 9.5.7: INSERT WHERE NOT IN (or NOT EXISTS) not working with bulk-insert of multiple lines at once

1 Answers1