14

I have created below table with primary key in snowflake and whenever i am trying to insert data into this table, it's allow duplicate records also. How to restrict duplicate id ?

create table tab11(id int primary key not null,grade varchar(10));

insert into tab11 values(1,'A');
insert into tab11 values(1,'B');

select * from tab11;

Output: Inserted duplicate records.

ID  GRADE
1   A
1   B
Razneesh
  • 1,147
  • 3
  • 13
  • 29
Neeraj Yadav
  • 201
  • 1
  • 4
  • 8

6 Answers6

7

Snowflake allows you to identify a column as a Primary Key but it doesn't enforce uniqueness on them. From the documentation here:

Snowflake supports defining and maintaining constraints, but does not enforce them, except for NOT NULL constraints, which are always enforced.

A Primary Key in Snowflake is purely for informative purposes. I'm not from Snowflake, but I imagine that enforcing uniqueness in Primary Keys does not really align with how Snowflake stores data behind the scenes and it probably would impact insertion speed.

Simon D
  • 5,730
  • 2
  • 17
  • 31
4

You may want to look at using a merge statement to handle what happens when a row with a duplicate PK arrives:

create table tab1(id int primary key not null, grade varchar(10));

insert into tab1 values(1, 'A');

-- Try merging values 1, and 'B': Nothing will be added
merge into tab1 using 
    (select * from (values (1, 'B')) x(id, grade)) tab2 
  on tab1.id = tab2.id
    when not matched then insert (id, grade)
                          values (tab2.id, tab2.grade);

select * from tab1;

-- Try merging values 2, and 'B': New row added
merge into tab1 using 
    (select * from (values (2, 'B')) x(id, grade)) tab2 
  on tab1.id = tab2.id
    when not matched then insert (id, grade)
                          values (tab2.id, tab2.grade);

select * from tab1;

-- If instead of ignoring dupes, we want to update:
merge into tab1 using 
    (select * from (values (1, 'F'), (2, 'F')) x(id, grade)) tab2 
  on tab1.id = tab2.id
    when matched then update set tab1.grade = tab2.grade
    when not matched then insert (id, grade)
                          values (tab2.id, tab2.grade);

select * from tab1;

For more complex merges, you may want to investigate using Snowflake streams (change data capture tables). In addition to the documentation, I have created a SQL script walk through of how to use a stream to keep a staging and prod table in sync:

https://snowflake.pavlik.us/index.php/2020/01/12/snowflake-streams-made-simple

Greg Pavlik
  • 10,089
  • 2
  • 12
  • 29
1

You could try to use SEQUENCE to fit your requirement

https://docs.snowflake.net/manuals/user-guide/querying-sequences.html#using-sequences

sprethepa
  • 544
  • 2
  • 4
  • 1
    Please consider [edit]ing your answer to add some explanation and details. While it might answer the question, just adding links does not help OP or future community members understand the issue or solution. – hongsy Jan 24 '20 at 07:03
1

Snowflake does NOT enforce unique constraints, hence you can only mitigate the issue by:

  • using a SEQUENCE to populate the column you want to be unique;
  • defining the column as NOT NULL (which is effectively enforced);
  • using a stored procedure where you can programmatically ensure no duplicates are introduced;
  • using a stored procedure (which could be run by scheduled Task possibly) to de-duplicate on a regular basis;
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
1

You will have to check for duplicates yourself during the insertion (within your INSERT query).

Greg Pavlik's answer using a MERGE query is one way to do it, but you can also achieve the same result with an INSERT query (if you don't plan on updating the existing rows -- if you do, use MERGE instead)

The idea is to insert with a SELECT that checks for the existence of those keys first, along with a window function to qualify the records and remove duplicates from the insert data itself. Here's an example:

INSERT INTO tab11
SELECT *
FROM (VALUES
  (1,'A'),
  (1,'B')
) AS t(id, grade)
-- Make sure the table doesn't already contain the IDs we're trying to insert
WHERE id NOT IN (
  SELECT id FROM tab11
)
-- Make sure the data we're inserting doesn't contain duplicate IDs
-- If it does, only the first record will be inserted (based on the ORDER BY)
-- Ideally, we would want to order by a timestamp to select the latest record
QUALIFY ROW_NUMBER() OVER (
  PARTITION BY id
  ORDER BY grade ASC
) = 1;

Alternatively, you can achieve the same result with a LEFT JOIN instead of a WHERE NOT IN (...) -- but it doesn't make a big difference unless your table is using a composite primary key (so that you can join on multiple keys).

INSERT INTO tab11
SELECT t.id, t.grade
FROM (VALUES
  (1,'A'),
  (1,'B')
) AS t(id, grade)
LEFT JOIN tab11
  ON tab11.id = t.id
-- Insert only if no match is found in the join (i.e. ID doesn't exit)
WHERE tab11.id IS NULL
QUALIFY ROW_NUMBER() OVER (
  PARTITION BY t.id
  ORDER BY t.grade ASC
) = 1;

Side note: Snowflake is an OLAP database (as opposed to OLTP), and hence is designed for analytical queries & bulk operations (as opposed to operations on individual records). It's not a good idea to insert records one at a time in your table; instead, you should ingest data in bulk into a landing/staging table (possibly using Snowpipe), and use the data in that table to update your destination table (ideally using a table stream).

Marco Roy
  • 4,004
  • 7
  • 34
  • 50
0

Snowflake documentation says it doesnt enforce the constraint.

https://docs.snowflake.com/en/sql-reference/constraints-overview.html

Instead of the load script process to fail, I would rather try and use merge. I have not used merge statements in snowflake yet. For other nosql databases, I have used merge statements instead of insert.

osr
  • 21
  • 1
  • 3