1

Similar Question: How to automatically reseed after using identity_insert?

I have a stored procedure that runs once a day, part of this procedure is to delete ~million rows from a table and re-insert a similar number of rows (there are millions of other rows that are not affected), once it finishes it rebuilds all the indexes on the table ready for use.

Performance wise, i have no problem, the only annoyance is the excessive size of the Primary Key - the primary key is an IDENTITY INT - obviously, deleting a million rows then re-inserting leaves me with a ~1000000 gap every day.

So, i want to reseed the ID column after the DELETE prior to the INSERT.

I know that i can use CHECKIDENT to reseed the identity column - but only if the new seed value is known before calling CHECKIDENT- i.e. using MAX() immediately prior to the CHECKIDENT.

My table is a very high traffic table, whilst it's unlikely, there is a small chance that other inserts / deletes may be attempted during this process, therefor i am a bit worried about resorting to MAX()+1 to determine my new seed value (i.e. a row being inserted from elsewhere after the MAX() was performed but prior to the completion of CHECKIDENT - wich would result in an error), eg:

DECLARE @NewSeed INT 
SELECT  @NewSeed = ISNULL(MAX(IdentityColumn), 0)
FROM    tbl_Whatever 
DBCC CHECKIDENT('tbl_Whatever', RESEED, @NewSeed) 

My Question: is there a slicker way to perform a reseed based on the current max value of the Identity column? - I'm assuming the answer is no, but just thought i'd ask.

SQL Version is 2008 R2

Community
  • 1
  • 1
HeavenCore
  • 7,533
  • 6
  • 47
  • 62
  • Just use BIGINT? If just over 2 billion positive values are not enough, BIGINT yields just over 9 billion billion positive values. – MatBailie Mar 30 '12 at 12:31
  • INT is more than big enough if i avoid the gaps. switching to BIGINT would just be a pointless extra 4 bytes per row (when considering 5+ million rows that's alot of wasted bytes (20MB?)) - also, iIt's the practicality aspect - for instance, on interfaces / html tables / reports we dont want these massive numbers & confusing gaps – HeavenCore Mar 30 '12 at 12:34
  • This is bit off-topic - sorry - When you say that not all records are deleted, are you guaranteed that the un-touched records don't have high id values? If you leave even one record, and that has the highest id, you get no benefit here. *[Back on topic; I think you my have the only way of doing this.]* – MatBailie Mar 30 '12 at 12:41
  • Hi Dems, a valid question. But yes i'm sure, the table in question is date based (bookings for the next 2+ years based on meta information elsewhere), rows are deleted based on date and re-inserted with updated versions. only rows getdate() + 1 day are deleted, all rows < getdate() + 1 are left alone as these are in the past and final - sorry if that's a little vague - just trying to keep it simple :) - if you're sure there is no better way of doing it, go ahead and post that as the answer and i'll mark accordingly. – HeavenCore Mar 30 '12 at 12:46
  • Why are you loading so many records that you want to delete? Why not make your load process only pick the records you want to keep or load to a staging table and then do the processing there and then send to the prod table? – HLGEM Mar 30 '12 at 14:00
  • If you need to keep reseeding it, why have an ident? If you are reusing the IDs to have new meanings, then it is not a valid primary key and should not be used as such. Therefore, you must have a different primary key. What purpose does the ident serve? – cadrell0 Mar 30 '12 at 14:09
  • @cadrell0 Reused or not it is still a valid unique identifier with as much meaning as a fresh iden. By that argument iden is not a valid PK at all. Inden PK has use in a PK FK relationship. Iden servers as a tie breaker in sort to have consistent results. And if I want to remove duplicates but keep one need a PK. – paparazzo Mar 30 '12 at 18:17
  • @Blam So long as the OP cascades deletes or doesn't have FKs, it is valid to reuse the ident. However, reusing keys to have a completely different meaning just seems like a really bad idea and I would avoid it. – cadrell0 Mar 30 '12 at 18:20
  • @cadrell0 the Identity is vital for the millions of rows where the rows timestamp is < getdate() - these rows are never deleted. The data in question is an auto generated register of attendance of sorts based on rotas & other meta information. this meta info is changed every day - rather than tracking changes it's much easier (and faster) to re-insert the rows as opposed to tracking partial changes (eg: avoiding a set of update insert and delete statements with 1 big delete & insert) – HeavenCore Mar 30 '12 at 19:40
  • @cadrell0 If there is a FK relationship then the primary key cannot be deleted until the FK is first deleted. It is call declarative referential integrity (DRI). It is an iden. A reused value just as much meaning as a new - uniqueness. – paparazzo Mar 30 '12 at 19:51

2 Answers2

1

Use a transaction to update the seed. This operation is so fast that you don't have to worry about locks.

JotaBe
  • 38,030
  • 8
  • 98
  • 117
  • Just to add to the answer. I am pretty sure that if you put both the Select statement that gets the Max Id and the Insert Statement within the same transaction SQL Server will not allow updates to the table until the transaction has been committed. – Jive Boogie Mar 30 '12 at 13:32
  • That's way I say: "This operation is so fast that you don't have to worry about locks". – JotaBe Mar 30 '12 at 17:36
  • I am not saying you won't have to worry about locks because is is so fast but that while the transaction is open a lock will be held on the table that will prevent any inserts from being made until the transaction is committed. – Jive Boogie Mar 30 '12 at 17:59
  • Agreed, wrapping the reseed and the insert into 1 transaction seems to be the best option - marking as answer. – HeavenCore Mar 30 '12 at 19:32
0

As Dems said setting the Identity to Max + 1 does not eliminate gaps.
This is more complex but it does fill in gaps.

  1. delete
  2. reset identity (in a transaction)
  3. determine if there are gaps
  4. if there are gaps insert rows with SET IDENTITY_INSERT ON and fill in the gaps

Clearly the stored procedure needs to to determine the next gap value.

If you are inserting as many as you delete you don't even need to reset the identity.

I think you can even have other inserts hot and they will use the automatic identity

From the documentation http://msdn.microsoft.com/en-us/library/ms188059.aspx

At any time, only one table in a session can have the IDENTITY_INSERT property set to ON. If a table already has this property set to ON, and a SET IDENTITY_INSERT ON statement is issued for another table, SQL Server returns an error message that states SET IDENTITY_INSERT is already ON and reports the table it is set ON for.

If the value inserted is larger than the current identity value for the table, SQL Server automatically uses the new inserted value as the current identity value.

paparazzo
  • 44,497
  • 23
  • 105
  • 176
  • If you create a stored procedure which needs to calculate and insert values in the identity column, to fill its gaps, why wouldn't you use a regular column without identity? That's overkill. – JotaBe Mar 30 '12 at 23:26
  • @JotaBe overkill?? Read the problem statement. This is an active table with other inserts and deletes. I even mention can have other inserts hot. The large gaps are caused by deletes from this stored procedure. The inserts from the stored procedure is a controlled environment that could close the gaps. The individual inserts could still use the convenience of an identity. – paparazzo Mar 31 '12 at 03:56
  • What i mean is that you don't need to have an Identitiy column if you're going to use a procedure to fill gaps. In that case if you use a regular column you get rid of all the extra work to setting identity insert on/off and reseeding. – JotaBe Mar 31 '12 at 13:50
  • @JotaBe Is this a correct summary of your position? Leave existing structure and modify one stored procedure is overkill. Yet, change table structure and modify ALL table insert commands is underkill. – paparazzo Apr 01 '12 at 17:08