0

I have a problem like this: I need to optimize the application, with db (postgreSQL), the table looks like this:

CREATE TABLE voter_count(
                    id SERIAL, 
                    name VARCHAR NOT NULL, 
                    birthDate DATE NOT NULL, 
                    count INT NOT NULL, 
                    PRIMARY KEY(id), 
                    UNIQUE (name, birthDate))

I have more than a thousand such voters, and I need to put all of them in the database, but among them there are several duplicates that could vote several times (from 2 to infinity), and I need to, when meeting such a duplicate, increase the count field for an existing one (for a voter with the same name and birthdate). Previously, I just checked whether there is such a voter in the table or not, and if there is, then find it and increase the count.

But the program worked for too long, and I tried to do it through MULTI INSERT and use ON CONFLICT DO UPDATE to increase count, but I get an error, then I asked a question on stackoverflow, and I was offered to do a lot of INSERTs, through a loop, but in PostgreSQL.

INSERT INTO voter_count(name, birthdate, count) 
VALUES 
('Ivan', '1998-08-05', 1), 
('Sergey', '1998-08-29', 1), 
('Ivan', '1998-08-05', 1) 
ON CONFLICT (name, birthdate) DO UPDATE SET count = (voter_count.count + 1)

Question: how to do INSERT in a loop through PostgreSQL.

  • `do a lot of INSERTs, through a loop, but in PostgreSQL.` that's the worst possible way to do this. SQL is a set-oriented language, there are *very* few cases where a loop would help – Panagiotis Kanavos Aug 16 '21 at 06:52
  • The multi-row `INSERT` should work just fine. Please detail the error you got. – Laurenz Albe Aug 16 '21 at 09:00

2 Answers2

2

Probably the best option is to insert before all the data in a table without primary key, for instance:

CREATE TABLE voter_count_with_duplicates(
                name VARCHAR NOT NULL, 
                birthDate DATE NOT NULL)

and then insert the data with a single statement:

INSERT INTO voter_count (name, birthDate, count)
SELECT name, birthDate, COUNT(*)
FROM voter_count_with_duplicates
GROUP BY name, birthDate

Note that if you have the data in a structured text file (for instance a CSV file), you can insert all the data into voter_count_with_duplicates with a single COPY statement.

If you have to insert (a lot of) new data with the table already populated, there are several possibilities. One is to use the solution in the comment. Another one is to perform an an update and an insert:

 WITH present_tuples AS 
  (SELECT name, birthDate, COUNT(*) AS num_of_new_votes
   FROM voter_count_with_duplicates d JOIN voter_count c ON
         v.name = d.name and v.birthDate = d.birthDate
   GROUP BY name, birthDate)
 UPDATE voter_count SET count = count + num_of_new_votes
           FROM present_tuples
           WHERE present_tuples.name = voter_count.name
             AND present_tuples.birthDate = voter_count.birthDate;

WITH new_tuples AS 
  (SELECT name, birthDate, COUNT(*) AS votes
   FROM voter_count_with_duplicates d 
   WHERE NOT EXISTS SELECT * 
                    FROM voter_count c
                    WHERE v.name = d.name and v.birthDate = d.birthDate
   GROUP BY name, birthDate)
INSERT INTO voter_count (name, birthDate, count)
SELECT name, birthDate, votes
FROM new_tuples;
Renzo
  • 26,848
  • 5
  • 49
  • 61
1

What you want to achieve is colloquially called an upsert; insert the row, if it doesn't exist, else update. The operation to use for this is MERGE.

The data set you want to merge into the existing table is the aggregate of your values grouped by name and bithdate with their total sum you want to insert/add.

MERGE INTO voter_count vc
USING 
(
  SELECT name, birthdate, SUM(cnt) as total
  FROM 
  (
    VALUES 
      ('Ivan', DATE '1998-08-05', 1), 
      ('Sergey', DATE '1998-08-29', 1), 
      ('Ivan', DATE '1998-08-05', 1) 
  ) input_data (name, birthdate, cnt)
  GROUP BY name, birthdate
) data ON (data.name = vc.name and data.birthdate = vc.birthdate)
when not matched 
  insert (name, birthdate, count) values (data.name, data.birthdate, data.total)
when matched
  update set count = count + data.total;
Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
  • @a_horse_with_no_name: Why not? PostgreSQL features `MERGE`, so why shouldn't this work? – Thorsten Kettner Aug 16 '21 at 06:53
  • [Where in the manual](http://www.postgresql.org/docs/current/static/sql-commands.html) is the MERGE command documented? –  Aug 16 '21 at 06:58
  • @ a_horse_with_no_name: Here: https://www.postgresql.org/message-id/attachment/23520/sql-merge.html and here: https://www.postgresql.org/message-id/attachment/55920/sql-merge.html. However, it seems you are right; in spite of MERGE being documented for PostgreSQL in the two links, it doesn't seem to be supported. Weird. – Thorsten Kettner Aug 16 '21 at 07:02
  • If it's not in the manual, it's not "documented". Those mail attachments were probably part of the discussions on the hackers mailing list regarding the implementation of that feature. –  Aug 16 '21 at 07:12
  • @a_horse_with_no_name: Okay. Thank you. I am deleting my answer. – Thorsten Kettner Aug 16 '21 at 07:55
  • To clarify: `MERGE` has been committed at some point immediately before feature freeze, but the implementation was flawed and the patch was reverted. There have been subsequent attempts to get it committed, but without success. – Laurenz Albe Aug 16 '21 at 08:58