10

I'd like to generate a single sql query to mass-insert a series of rows that don't exist on a table. My current setup makes a new query for each record insertion similar to the solution detailed in WHERE NOT EXISTS in PostgreSQL gives syntax error, but I'd like to move this to a single query to optimize performance since my current setup could generate several hundred queries at a time. Right now I'm trying something like the example I've added below:

INSERT INTO users (first_name, last_name, uid) 
SELECT ( 'John', 'Doe', '3sldkjfksjd'), ( 'Jane', 'Doe', 'adslkejkdsjfds')
WHERE NOT EXISTS (
  SELECT * FROM users WHERE uid IN ('3sldkjfksjd', 'adslkejkdsjfds')
)

Postgres returns the following error:

PG::Error: ERROR:  INSERT has more target columns than expressions

The problem is that PostgresQL doesn't seem to want to take a series of values when using SELECT. Conversely, I can make the insertions using VALUES, but I can't then prevent duplicates from being generated using WHERE NOT EXISTS.

http://www.techonthenet.com/postgresql/insert.php suggests in the section EXAMPLE - USING SUB-SELECT that multiple records should be insertable from another referenced table using SELECT, so I'm wondering why I can't seem to pass in a series of values to insert. The values I'm passing are coming from an external API, so I need to generate the values to insert by hand.

Community
  • 1
  • 1
Paul
  • 2,021
  • 5
  • 22
  • 33
  • `SELECT ( 'John', 'Doe', '3sldkjfksjd'), ( 'Jane', 'Doe', 'adslkejkdsjfds')` creates two columns which are "structures" (that have 3 fields each), not two rows with 3 columns. –  Jul 15 '14 at 22:24
  • what's the correct syntax here? Sorry I'm a SQL newbie – Paul Jul 15 '14 at 22:26

2 Answers2

33

Your select is not doing what you think it does.

The most compact version in PostgreSQL would be something like this:

with data(first_name, last_name, uid)  as (
   values
      ( 'John', 'Doe', '3sldkjfksjd'),
      ( 'Jane', 'Doe', 'adslkejkdsjfds')
) 
insert into users (first_name, last_name, uid) 
select d.first_name, d.last_name, d.uid
from data d
where not exists (select 1
                  from users u2
                  where u2.uid = d.uid);

Which is pretty much equivalent to:

insert into users (first_name, last_name, uid) 
select d.first_name, d.last_name, d.uid
from (
   select 'John' as first_name, 'Doe' as last_name, '3sldkjfksjd' as uid
   union all
   select 'Jane', 'Doe', 'adslkejkdsjfds'
) as d
where not exists (select 1
                  from users u2
                  where u2.uid = d.uid);
Jordan Parmer
  • 36,042
  • 30
  • 97
  • 119
  • so i'm creating a CTE which I will then reference in the query. – Paul Jul 15 '14 at 22:34
  • 1
    @Paul: correct, you need to build a result somehow, and the only way to do that is the `values` clause or a `select` with constant values. –  Jul 15 '14 at 22:35
  • 1
    with data as (first_name, last_name, uid) as () is actually generating a syntax error. I think with data (first_name, last_name, uid) as ( ) is correct. – Paul Jul 15 '14 at 23:21
  • 1
    Is it possible to get the id of all the newly inserted records as well as the previously existing ones? – John William Domingo Mar 25 '16 at 07:15
  • @a_horse_with_no_name: Thanks so much, this answer helped a lot because most examples only show inserting a single value via select. – romm Dec 03 '19 at 10:30
2

a_horse_with_no_name's answer actually has a syntax error, missing a final closing right parens, but other than that is the correct way to do this.

Update: For anyone coming to this with a situation like mine, if you have columns that need to be type cast (for instance timestamps or uuids or jsonb in PG 9.5), you must declare that in the values you pass to the query:

-- insert multiple if not exists
-- where another_column_name is of type uuid, with strings cast as uuids
-- where created_at and updated_at is of type timestamp, with strings cast as timestamps
WITH data (id, some_column_name, another_column_name, created_at, updated_at) AS (
  VALUES
    (<id value>, <some_column_name_value>, 'a5fa7660-8273-4ffd-b832-d94f081a4661'::uuid, '2016-06-13T12:15:27.552-07:00'::timestamp, '2016-06-13T12:15:27.879-07:00'::timestamp),
    (<id value>, <some_column_name_value>, 'b9b17117-1e90-45c5-8f62-d03412d407dd'::uuid, '2016-06-13T12:08:17.683-07:00'::timestamp, '2016-06-13T12:08:17.801-07:00'::timestamp)
)
INSERT INTO table_name (id, some_column_name, another_column_name, created_at, updated_at)
SELECT d.id, d.survey_id, d.arrival_uuid, d.gf_created_at, d.gf_updated_at
FROM data d
WHERE NOT EXISTS (SELECT 1 FROM table_name t WHERE t.id = d.id);

a_horse_with_no_name's answer saved me today on a project, but had to make these tweaks to make it perfect.

Community
  • 1
  • 1