4

SQL Server 2008.

I have few pieces of code working together and the effect is rather strange so I decided to share.

I have a table with emails - for simplicity let as assume there are 2 columns: ID identity(1,1) int primary key email varchar(900)

Also there is a unique key on email column with IGNORE_DUP_KEY = ON.

On the other side of a problem is a merge statement:

merge into dbo.email
using (
     select distinct email t from #t
) p 
ON t = email
when not matched by target then
insert (email) values  (t);

It is worth to note that #t is basically (for purpose of this query) a one column table with email varchar(500).

What is surprising is that the query failed:

"Violation of UNIQUE KEY constraint 'uq_email'. Cannot insert duplicate key in object dbo.email"

However this works flawlessly:

insert into dbo.email (email) select email from #t

While I can obviously work around the problem (the insert statement is faster anyway so it is a way to go) I have no idea why the MERGE statement failed. Any ideas, anyone?

EDIT: Full use case: Step 1:

create table #temp (
col1 varchar(500),
col2 varchar(500),
col3 varchar(500),
col4 varchar(500),
col5 varchar(500),
email varchar(500),
id int)

Step 2:

#temp is populated from CSV file for many purposes

Step 3:

merge into dbo.email using ( select distinct email t from #temp ) p ON t = email when not matched by target then insert (email) values (t);

Step 0 - CREATE script for dbo.email:

CREATE TABLE dbo.email (
id int identity(1,1) not null,
email varchar(900) null,
loaddate date default null,
constraint [PK__email__1111] PRIMARY KEY CLUSTERED
(
    id asc
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON PRIMARY,
CONSTRAINT [uq_email] UNIQUE NONCLUSTERED
(
     EMAIL asc
)
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = ON, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON PRIMARY
) ON PRIMARY

I can't make much more without giving you hunderds of gigabytes of data which is out of a question obviously.

nimdil
  • 1,361
  • 10
  • 20
  • Based on the information you have given the statement should work correctly and not attempt to insert where matching rows already exist. Is there a trigger on the target `email` table? – Ed Harper Feb 10 '12 at 10:37
  • No. There is really nothing else there. I'm also surprised that it failed as I see no reason (obviously). Maybe it is a bug in SQL Server. After all - as my prof used to say - it is just a software :-. – nimdil Feb 10 '12 at 10:44
  • Can you post a complete example which demonstrates the problem? I've created a test based on the information you've provided and it works as expected. – Ed Harper Feb 10 '12 at 11:00
  • A side point - if you need to ignore duplicates, what's the value in having the unique constraint at all? – Ed Harper Feb 10 '12 at 11:17
  • The table needs to have unique emails however there are many data sources inserting data there and they may have duplicates - in themselves and between each other. – nimdil Feb 10 '12 at 11:28
  • In which case I don't understand why you would switch `IGNORE_DUP_KEY = ON` - which means that for `INSERT` statements, it is permitted for duplicate keys (which would otherwise break the constraint) to be inserted - see http://msdn.microsoft.com/en-us/library/ms186869.aspx – Ed Harper Feb 10 '12 at 11:33

1 Answers1

9

Your email column allows for null values. If you have a null value in target and a null value in source they will not be matched in the on expression. You will have a unique constraint exception when you are inserting the second null value.

Try this instead:

merge into dbo.email
using (
     select distinct email t from #t where email is not null
) p 
ON t = email
when not matched by target then
insert (email) values  (t);

Update:
Regarding ignore_dup_key you should read the remarks section in documentation for the merge statement:

If IGNORE_DUP_KEY is set to ON for any unique indexes on the target table, MERGE ignores this setting.

To have a unique constraint in SQL Server that allows for multiple null values you should add a unique filtered index instead.

create unique index UX_xx on TableName(ColName) where ColName is not null
Mikael Eriksson
  • 136,425
  • 22
  • 210
  • 281
  • 2
    Brilliant catch. Also I will probably never get used to the fact that in SQL Server - unlike i.e. ORacle - unique column allows only one null value. -> However - while it does explain why the merge didn't catch null = null situation, it does not explain why ignore_dup_key won't work in this case. It works for insert statement, after all. – nimdil Feb 10 '12 at 15:15