You will have to check for duplicates yourself during the insertion (within your INSERT
query).
Greg Pavlik's answer using a MERGE
query is one way to do it, but you can also achieve the same result with an INSERT
query (if you don't plan on updating the existing rows -- if you do, use MERGE
instead)
The idea is to insert with a SELECT
that checks for the existence of those keys first, along with a window function to qualify the records and remove duplicates from the insert data itself. Here's an example:
INSERT INTO tab11
SELECT *
FROM (VALUES
(1,'A'),
(1,'B')
) AS t(id, grade)
-- Make sure the table doesn't already contain the IDs we're trying to insert
WHERE id NOT IN (
SELECT id FROM tab11
)
-- Make sure the data we're inserting doesn't contain duplicate IDs
-- If it does, only the first record will be inserted (based on the ORDER BY)
-- Ideally, we would want to order by a timestamp to select the latest record
QUALIFY ROW_NUMBER() OVER (
PARTITION BY id
ORDER BY grade ASC
) = 1;
Alternatively, you can achieve the same result with a LEFT JOIN
instead of a WHERE NOT IN (...)
-- but it doesn't make a big difference unless your table is using a composite primary key (so that you can join on multiple keys).
INSERT INTO tab11
SELECT t.id, t.grade
FROM (VALUES
(1,'A'),
(1,'B')
) AS t(id, grade)
LEFT JOIN tab11
ON tab11.id = t.id
-- Insert only if no match is found in the join (i.e. ID doesn't exit)
WHERE tab11.id IS NULL
QUALIFY ROW_NUMBER() OVER (
PARTITION BY t.id
ORDER BY t.grade ASC
) = 1;
Side note: Snowflake is an OLAP database (as opposed to OLTP), and hence is designed for analytical queries & bulk operations (as opposed to operations on individual records). It's not a good idea to insert records one at a time in your table; instead, you should ingest data in bulk into a landing/staging table (possibly using Snowpipe), and use the data in that table to update your destination table (ideally using a table stream).