14

Edit: My question is, "Why does my first code example work?" Please read on...

Edit1: There is no doubt that a unique constraint is the correct way to ensure duplicates don't happen. This is a given. However, sometime we need to know that we're attempting a duplicate entry. Further, this post goes beyond merely handling duplicates.

This is potentially a repeat of probably over 100 questions on SO. I've read an endless array of confusions and contradictions about the simple questions of atomic updates, locks, and concurrency.

I see that blogs and experts disagree widely on these points. Here I provide test code based on the various solutions people have advised, indicate the results, state my views, and invite your comment.

Context: I'm running SQL Server 2008 Express SP2.

I created the following test table:

create table dbo.Temp (Col int)

The table deliberately has no constraints on it as we want to test the SQL code ideas, not constraints.

I ran the following concurrently in 2 and then 3 query windows:

declare @i int
set @i = 0

while @i < 5000 begin
    set @i = @i + 1
    update dbo.temp set Col = (SELECT Col from dbo.Temp) + 1
end

I did not use any explicit locking, as can be seen. All DB settings are default. I checked the value of Col and it was the desired number: 25,000. Nothing missed.

Since SQL Server is ACID, the "A" tells us that a single statement is executed atomically. Therefore, based on the above, we could agree with those who say that locks are not needed for a simple update as above.

Next, I ran the following concurrently in 3 query windows:

while @i < 5000 begin
    set @i = @i + 1
    insert into dbo.temp select @i where not exists
      (select 1 from dbo.temp where Col = @i)
end

The results were not correct, despite the fact this is one statement. There were missing values, duplicate values, and > 5k rows.

Next, I ran the following popular solution concurrently in 3 query windows:

declare @i int
set @i = 0

while @i < 5000 begin
    set @i = @i + 1
    insert into dbo.temp select @i where not exists 
      (select 1 from dbo.temp with (updlock) where Col = @i)
end

The results were not correct. There were missing values, duplicate values, and > 5k rows.

Next, for those who have doubts that SQL Server implicitly wraps single statements in a transaction (aka "A" in ACID):

declare @i int
set @i = 0

while @i < 5000 begin
    set @i = @i + 1
    begin tran
    insert into dbo.temp select @i where not exists 
      (select 1 from dbo.temp with (updlock) where Col = @i)
    commit tran
end

Same incorrect results.

Next, I ran the following concurrently in 3 query windows:

declare @i int
set @i = 0

while @i < 5000 begin
    set @i = @i + 1
    insert into dbo.temp select @i where not exists 
      (select 1 from dbo.temp with (XLOCK, ROWLOCK) where Col = @i)
end

This worked. 5k unique values only.

Next, the following in 3 windows:

declare @i int
set @i = 0

while @i < 5000 begin
    set @i = @i + 1
    merge dbo.temp as t
    using (select @i) as test (Col)
    ON (t.Col = test.Col)
    when not matched then
        insert values (@i);
end

This worked. 5k unique values only.

My conclusions are:

  1. SQL Server is atomic for single SQL operations, not groups of operations that appear as one statement.
  2. UPDLOCK is not effective, as shown and as suggested by many, at providing the required locking needed to guarantee update integrity in a contentious environment.
  3. From a locking perspective, only XLOCK guarantees concurrency integrity and atomicity in multi-statement operations as shown.
  4. The MERGE command is a single command, and therefore is atomic.

Please test this for yourself.

Now, can someone explain to me why my first example worked? :-)

IamIC
  • 17,747
  • 20
  • 91
  • 154
  • The best solution here would be to use a INT IDENTITY to let SQL Server handle the uniqueness of the INT column....... but that's a well know fact, nothing new here, really..... – marc_s Dec 11 '10 at 09:48
  • Of course. I always use indexes and allow indexes to genuinely guarantee uniqueness. However, obviously sometimes we need to know a row exists and process accordingly, in which case we have to test. Do you have any idea why e.g. #1 works? – IamIC Dec 11 '10 at 09:51
  • @Marc, as a side note, I got into a conversation with you on another post, but we never concluded. It was about a CTE I wrote. You indicated it was recursive, but I couldn't see why. Since you def. know your stuff, I have attention on this. http://stackoverflow.com/questions/4379039/get-xml-nodes-from-sql-server-column-as-comma-separated-list/4379166#4379166 – IamIC Dec 11 '10 at 09:53
  • @Mrac thanks - email sent :-) – IamIC Dec 11 '10 at 10:28
  • 6
    Odd that a 3-time favorited question is closed as "not a question". – IamIC Dec 13 '10 at 04:56
  • @IanC: Grumpy moderators were over-tired and didn't have their juice. – Ian Boyd Oct 05 '11 at 20:45
  • @IanBoyd I believe you're correct. – IamIC Oct 07 '11 at 23:42
  • I don't know how to ask this in the correct form but is there a way I can use this 'not a real question' to reference other 'real questions' that I might have? It is a great article that can be expounded on and continues to provide insight in the year 2018. –  May 30 '18 at 15:21

0 Answers0