Problems Add/Update Big Data on PostgresSQL

Question

I want to add large amounts of data to the table. Before adding, I check whether the data exists in the table or not.

I am dealing with the following:

Example:

Table users

id | name | address

.. | .... | .......

select id from users where id = ... and name = ...

if not exist

insert....

if exist

update ....

My problem is the time taken too long. I wonder if everyone has a solution to solve this problem faster?

Jim Jones · Answer 1 · 2020-03-19T11:58:53.013

You actually do not need to perform this check manually. It is rather the job of a constraint, e.g. via a primary key.

Table with a primary key constraint based on id and name:

CREATE TABLE users (
id INT, name TEXT, address TEXT,
PRIMARY KEY (id,name));

So, if you try to insert two records with the same id and name you will get an exception - the error message bellow is in German, but it basically says that the pk constraint was violated:

INSERT INTO users VALUES (1,'foo','add 1');
INSERT INTO users VALUES (1,'foo','add 2');

FEHLER:  doppelter Schlüsselwert verletzt Unique-Constraint »users_pkey«
DETAIL:  Schlüssel »(id, name)=(1, foo)« existiert bereits.

In case you want to update address when id and name already exist, try using an UPSERT:

INSERT INTO users VALUES (1,'foo','add x')
ON CONFLICT (id, name) 
DO UPDATE SET address = EXCLUDED.address;

If you want to simply ignore the conflicting insert without raising an exception, just do as follows:

INSERT INTO users VALUES (1,'foo','add x')
ON CONFLICT DO NOTHING;

See this answer for more details.

Regarding speed: you have to rather check if your table has a proper index or even if an index makes sense at all when performing the insert. Sometimes importing large amount of data into a temporary UNLOGGED TABLE table without index, and then populating the target table with an SQL removing the duplicates is the best choice.

Problems Add/Update Big Data on PostgresSQL

1 Answers1