22

I have stock_price_alert table with 3 columns. stock_price_id is PRIMARY KEY & also FOREIGN KEY to other table. Table definition as below:

create table stock_price_alert (
    stock_price_id integer references stock_price (id) on delete cascade not null,
    fall_below_alert boolean not null,
    rise_above_alert boolean not null,
    primary key (stock_price_id)
);

I need to either:

1) INSERT record if not exist

-- query 1
INSERT INTO stock_price_alert (stock_price_id, fall_below_alert, rise_above_alert)
VALUES (1, true, false);

2) UPDATE record if exist

-- query 2
UPDATE stock_price_alert SET
    fall_below_alert = true,
    rise_above_alert = false
WHERE stock_price_id = 1;

First I need to issue SELECT query on stock_price_alert table, in order to decide whether to perform query (1) or (2).

Postgres supports INSERT INTO TABLE .... ON CONFLICT DO UPDATE ...:

-- query 3
INSERT INTO stock_price_alert (stock_price_id, fall_below_alert, rise_above_alert)
VALUES (1, true, false)
ON CONFLICT (stock_price_id) DO UPDATE SET
    fall_below_alert = EXCLUDED.fall_below_alert,
    rise_above_alert = EXCLUDED.rise_above_alert;

Instead of using query (1) or (2), can I always use query (3)? Then I don't need to issue SELECT query in prior & it helps to simplify the code.

But I am wondering, which is the best practice? Will query (3) cause performance issue or unwanted side effect? Thanks.

Shuwn Yuan Tee
  • 5,578
  • 6
  • 28
  • 42

2 Answers2

23

Query 3 is the Postgres syntax for "UPSERT" (= UPDATE or INSERT), introduced in Postgres 9.5.

From the documentation:

ON CONFLICT DO UPDATE guarantees an atomic INSERT or UPDATE outcome; provided there is no independent error, one of those two outcomes is guaranteed, even under high concurrency. This is also known as UPSERT – “UPDATE or INSERT”.

This is the best practice for what you are trying to achieve.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
Yoni Rabinovitch
  • 5,171
  • 1
  • 23
  • 34
0

I noticed/tested that is much faster for INSERTS (have yet to test UPSERTS) to use a WHERE NOT EXISTS in addition to ON CONFLICT. Typically about 3x faster than just allowing the ON CONFLICT to handle existence checks. I think this may carry over into UPSERTS, making it likely faster to do an INSERT and then and UPDATE. Here is my test for inserts only...

    --so i can keep rerunning
    DROP TABLE if exists temp1;
    DROP TABLE if exists temp2;

    --create a billion rows
    SELECT GENERATE_SERIES AS id INTO TEMP temp1
    FROM GENERATE_SERIES(1, 10000000);

    CREATE UNIQUE INDEX ux_id  ON temp1(id);
    ALTER TABLE temp1 CLUSTER ON ux_id;

    --create a second table to insert from, with the same data
    SELECT * INTO TEMP temp2 
    FROM temp1;

    CREATE UNIQUE INDEX ux_id2  ON temp2(id);
    ALTER TABLE temp2 CLUSTER ON ux_id2;

    --test inserting with on conflict only
    INSERT INTO temp1(id)
    SELECT id
    FROM temp2 ON conflict DO nothing;
    --execution time: 14.71s (1million rows)

    --test inserting with not exists and on conflict
    INSERT INTO temp1(id)
    SELECT t2.id
    FROM temp2 t2
    WHERE NOT EXISTS (SELECT 1 FROM temp1 t1 WHERE t2.id = t1.id) 
    --ON conflict DO nothing;
    --execution time: 5.78s (1million rows)
Jeremy Giaco
  • 342
  • 3
  • 5
  • 2
    I've been using postgresql for only a few weeks, so take my findings with a grain of salt... – Jeremy Giaco Jul 19 '18 at 18:53
  • 9
    In your example you don't do UPSERT. `WHERE NOT EXISTS` automatically eliminates `ON conflict DO nothing`. The difference between your examples is that in first case you catches exceptions via `ON conflict DO nothing` while in 2nd one you don't allow them to arise. Which is obviously cheaper and quicker – Ameba Brain Jun 13 '19 at 12:42
  • 1
    I realize this is an old post, but i was just going through my comments and noticed your reply. I think your statement "...Which is obviously cheaper and quicker" is the point. I was proving it, versus assuming it was obvious. The highest rated answer states that "ON CONFLICT DO NOTHING" is best practice. If it's slower and more expensive, why would that be best practice? – Jeremy Giaco Apr 04 '23 at 20:56