1

I have an error with arithmetic overflow inserting values into a lookup table with a row-id set as TINYINT datatype. This IS NOT a case where the number of unique records exceeds 255 values. This is a bit more unusual and did not occur during the first tests of this setup.

The production version of the code below actually only has 66 unique values, but it is possible that new values could be added (slowly and in very small numbers) over time... 255 available slots should be more than enough for the lifespan of this analysis process.

My initial thoughts were that it may be due to a cached plan recognizing the hierarchical source table has more than 255 values (there are in fact 1028), and evaluating that this may exceed the destination table's capacity. I have tested that this is not true however.

-- This table represents a small (tinyint) subset of unique primary values.
CREATE TABLE #tmp_ID10T_Test (
ID10T_Test_ID tinyint identity (1,1) not null,
ID10T_String varchar(255) not null
PRIMARY KEY CLUSTERED
(ID10T_String ASC)
WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = ON, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
) ON [PRIMARY]



-- This table represents a larger (smallint) set of non-unique source values, defined by a secondary key value (Value_Set).
CREATE TABLE #tmp_ID10T_Values (
ID10T_Value_ID smallint identity (1,1) not null,
ID10T_Value_Set tinyint not null,
ID10T_String varchar(255) not null
) ON [PRIMARY]



-- Create the initial dataset - 100 unique records; The insertion tests below illustrate that the INDEX is working
--  correctly on the primary key field for repetative values, however something is happening with the IDENTITY field...
DECLARE @ID10T tinyint
, @i tinyint -- A randomized value to determine which subset of printable ASCII characters will be used for the string.
, @String varchar(255)

SET @ID10T = 0
WHILE @ID10T < 100
    BEGIN

    SET @String = ''
        WHILE LEN(@String) < (1+ROUND((254 * RAND(CHECKSUM(NEWID()))),0))
        BEGIN
            SELECT @i = (1 + ROUND((2 * RAND()),0)) -- Randomize which printable character subset is drawn from.
            SELECT @String = @String + ISNULL(CASE WHEN @i = 1 THEN char(48 + ROUND(((57-48)* RAND(CHECKSUM(NEWID()))),0))
            WHEN @i = 2 THEN char(65 + ROUND(((90-65) * RAND(CHECKSUM(NEWID()))),0))
            WHEN @i = 3 THEN char(97 + ROUND(((122-97) * RAND(CHECKSUM(NEWID()))),0))
            END,'-')
        END

    INSERT INTO #tmp_ID10T_Values (ID10T_Value_Set, ID10T_String)
    SELECT 1, @String

    SET @ID10T = @ID10T + 1

    END

-- Demonstrate that IGNORE_DUP_KEY = ON works for primary key index on string-field
 SELECT * FROM #tmp_ID10T_Values


-- Method 1 - Simple INSERT INTO: Expect Approx. (100 row(s) affected)
INSERT INTO #tmp_ID10T_Test (ID10T_String)
SELECT DISTINCT ID10T_String
FROM #tmp_ID10T_Values
GO


-- Method 2 - LEFT OUTER JOIN WHERE NULL to prevent dupes.
--  this is the test case to determine whether the procedure cache is mixing plans
INSERT INTO #tmp_ID10T_Test (ID10T_String)
SELECT DISTINCT T1.ID10T_String
FROM #tmp_ID10T_Values AS T1
LEFT OUTER JOIN #tmp_ID10T_Test AS t2
ON T1.ID10T_String = T2.ID10T_String
WHERE T2.ID10T_Test_ID IS NULL
GO


-- Repeat Method 1: Duplicate key was ignored (0 row(s) affected).
INSERT INTO #tmp_ID10T_Test (ID10T_String)
SELECT DISTINCT ID10T_String
FROM #tmp_ID10T_Values
GO

This does not seem to be a query plan cache issue - I should see the arithmetic error on Method 1 retests if that were true.

-- Repeat Method 1: Expected: Arithmetic overflow error converting IDENTITY to data type tinyint.
INSERT INTO #tmp_ID10T_Test (ID10T_String)
SELECT DISTINCT ID10T_String
FROM #tmp_ID10T_Values
GO

I am particularly curious why the exception would be thrown. I can understand that in Method 1 all 100 unique values are tested... So conceivably the query agent sees a potential of 200 records after the second insert attempt; I DO NOT understand why it would see a potential for 300 records after a third repetition - the second attempt resulted in 0 rows so at most there would be a potential of 200 unique values.

Can someone explain this please?

  • I have also subsequently tested scenarios where I definitely insert less than the 255 limit using Method 2 defined above: I create a values set with (approx) 200 unique records and insert it. I create a unique value set with 55 (new) records, and insert the previous 200. The query correctly identifies that 200 of them already exist and will not attempt to insert; HOWEVER even THIS method produces the overflow. – user1884677 Apr 22 '13 at 20:46

1 Answers1

0

The sequence of events using IGNORE_DUP_KEY is:

  1. record is prepared for insert, including consumption of IDENTITY values. This makes the IDENTITY sequence go up to 200
  2. record is inserted. IGNORE_DUP_KEY is observed and the insertions are silently failed

Now your failed batch of INSERT actually generates the IDENTITY values 201-300, for which anything above 255 overflows your tinyint column.

This is trivially verified by running

select ident_current('tmp_ID10T_Test')

liberally throughout your code. So for your code, annotated below:

-- Method 1 - Simple INSERT INTO: Expect Approx. (100 row(s) affected)
   -- ident_current = 1

-- Method 2 - LEFT OUTER JOIN WHERE NULL to prevent dupes.
   -- ident_current = 100
   -- NOTE: The SELECT doesn't produce any rows. No rows to insert

-- Repeat Method 1: Duplicate key was ignored (0 row(s) affected).
   -- ident_current = 200
   -- NOTE: SELECT produced 100 rows, which consumed IDENTITY numbers. Later ignored

-- Repeat Method 1: Expected: Arithmetic overflow error converting IDENTITY to data type tinyint.
   -- ident_current = 255
   -- error raised when IDENTITY reaches 256
RichardTheKiwi
  • 105,798
  • 26
  • 196
  • 262
  • I suspected that may be the case - What I am not understanding is why is the current identity seed held, or am I not terminating the batch? For example - the first batch of 100 are entered (identity value is now 100 records in the table); the second batch is prepared (100 hypothetical records + 100 hardened records) and consumed. The 200 is still under the 255 limit, however no new records are inserted (I am NOT recreating the value set in the basic test if that was not clear). The third batch is prepared; I still have 100 original values, and 100 duplicates (yet the identity is now 300). – user1884677 Apr 22 '13 at 22:21
  • `why is the current identity seed held`? It is generated to fill in the recordset that forms the basis of the INSERT. The abortion of the insert ON DUPLICATE IGNORE comes afterwards. IDENTITY sequences are NEVER returned to the stream after they have been taken. Otherwise you will have serious concurrency issues for a high-write transactional table. – RichardTheKiwi Apr 22 '13 at 22:24
  • What can reset it then? It would be absurd for a table to essentially be filled with ghost records that failed an uniqueness check on insert? – user1884677 Apr 22 '13 at 22:38
  • The thing is that identity sequences have special use and meaning. You're attempting to subvert it for usage in a way not intended. You can reseed it using DBCC CHECKIDENT with an appropriate value, e.g. from a `select max(..)` value. You already know how to use LEFT JOIN to filter them from being inserted (attempted) in the first place. – RichardTheKiwi Apr 22 '13 at 23:57
  • Thanks for your answers RichardTheKiwi - you have helped me clear up some doubt raised by a very vehement argument against using surrogate keys as the clustered primary key. I should know better as I have been doing this for 7 years, but without formal training I occasionally defer to people with fancy letters after their name who make compelling arguments ;) - sometimes the tinker knows more than the academic eh? – user1884677 Apr 24 '13 at 06:10