1

The idea of this code was to insert new rows into a table after determining which "load_id"(s) are not present.

MERGE INTO stg.my_table AS tgt
using stg.my_view AS src
ON tgt. load_id = src.load_id
WHEN NOT matched THEN
  INSERT (id,
          document_type,
          state,
          publication_date,
          )
  VALUES (src.id,
          src.document_type,
          src.state,
          src.publication_date,
     );

Is there a way to achieve the same result without having to use a MERGE statement? Isn't an INSERT enough?

x89
  • 2,798
  • 5
  • 46
  • 110

1 Answers1

2

Using INSERT INTO:

INSERT INTO stg.my_table  -- column list here
SELECT src.*
FROM stg.my_view AS src
LEFT JOIN stg.my_table AS tgt
  ON tgt.load_id = src.load_id
WHERE tgt.load_id IS NULL;

Also using EXISTS

INSERT INTO stg.my_table  -- column list here
SELECT src.*
FROM stg.my_view AS src
WHERE NOT EXISTS (SELECT *
                  FROM  stg.my_table AS tgt
                  WHERE tgt.load_id = src.load_id);

Extras:

Both patterns above are syntax that are executed as ANTI-JOIN.

enter image description here

Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
  • is there a performance difference between these 3 alternate solutions? – x89 Jan 25 '23 at 18:41
  • @x89 The differences if any should be negligible. In any case you should always compare query profile and execution times. – Lukasz Szozda Jan 25 '23 at 18:43
  • @x89 IMO, `not exists` is a better solution because it communicates the intention of the code better AND also doesn't lead to duplicates that might result from a join when keys are not perfectly unique. – Radagast Jan 25 '23 at 19:36
  • @Rajat `NOT EXISTS` is not necessarily better solution than `LEFT JOIN WHERE tgt.join_col IS NULL`. The behaviour related duplicates is the same, if target contains duplicates it will not produce duplicates, if source contains duplicates both will generate duplicate. [NOT EXISTS vs LEFT JOIN ... WHERE col IS NULL live demo](https://dbfiddle.uk/t1d07vgF) – Lukasz Szozda Jan 25 '23 at 19:54
  • @LukaszSzozda You're correct about the the behavior of left join+null check implementation in your answer, although I would attribute that to the `where` clause rather than the `join`. I was making a broader point that `exists` doesn't cause "cross join by proxy" problem that `inner/left joins` are notorious for when datasets are duplicated on join key. `Exists` will retain existing duplicates, but won't add to that. – Radagast Jan 25 '23 at 20:30
  • "-- column list here" i am guessing it is not essential to specify column names but is better for readability? – x89 Jan 26 '23 at 07:28
  • @x89 It is a good practice to explicitly state column list both in insert into and avoid asterisk. `INSERT INTO trg(col1, col2,..) SELECT src.col1, src.col2, ...` – Lukasz Szozda Jan 26 '23 at 07:29