Reinserting rows with identity columns

Question

I'm implementing a queue in SQL Server (2008 R2) containing jobs that are to be performed. Upon completion, the job is moved to a history table, setting a flag to success or failure. The items in the queue table has an identity column as a primary key. The history queue has a combo of this id and a time stamp as a PK.

If a job fails, I would like the option to re-run it, and they way this is thought, is to move it back from the history table and back in to the live queue. For traceability purposes, I would like to have the reinserted row have the same ID as the original entry, which causes problems as this is an identity column.

I see two possible solutions:

1) Use IDENTITY_INSERT:

SET IDENTITY_INSERT TableName ON

-- Move from history to live queue

SET IDENTITY_INSERT TableName OFF

2) Create some custom logic to generate unique IDs, like getting the max ID value from both the live and history queue and adding one.

I don't see any real problems with 2 apart from it being messy, possibly poor performance and that it makes my neurotic skin crawl...

Option 1 I like, but I don't know the implications well enough. How will this perform? And I know that doing this to two tables at the same time will make things crash and burn. What happens if two threads does this to the same table at the same time?

Is this at all a good way to do this for semi-commonly used stored procedures, or should this technique just be used for batch inserting data once in a blue moon?

Any thoughts on which is the best option, or is there a better way?

When you rerun the job and need to copy it from history table why do you need to keep the same ID? — Alex Aza, May 18 '11 at 06:46
I have another table with info about each run (each time a job is performed) with performance info and time stamps. When re-running a job, I would like to be able to connect all runs to the same job to be able to see a full history of what has happened. — Christian Rygg, May 18 '11 at 06:52

score 1 · Accepted Answer · answered May 18 '11 at 06:57

1

I'd go with Option 1 - Use IDENTITY_INSERT

SET IDENTITY_INSERT TableName ON

-- Move from history to live queue

SET IDENTITY_INSERT TableName OFF

IDENTITY_INSERT is a setting that applies to the current connection - so if another connection is doing similar, it will have no impact. The only place you get an error with using it is if you attempt to set it ON on another table without first turning it OFF on the first table.

answered May 18 '11 at 06:57

Damien_The_Unbeliever

234,701
27
340
448

Ah, that makes it a lot more easy to work with :) I thought perhaps I would get problems with other SP's etc.Does this affect performance much? This will be a table with a lot of traffic, but this SP described here will rarely be used... – Christian Rygg May 18 '11 at 07:02
@Chris - I wouldn't expect any performance issues, but if you're worried about it, the only way to answer the question is to profile it with appropriate data (and volumes of data). In and of itself, it shouldn't cause any issues – Damien_The_Unbeliever May 18 '11 at 07:06
Cheers! I was hoping this would be a good option, and now I feel confident it is :-) – Christian Rygg May 18 '11 at 07:08

score 0 · Answer 2 · answered May 18 '11 at 06:46

0

Can't you use the original (live) identity value to insert into the history table? You say you combine it with a timestamp anyway.

answered May 18 '11 at 06:46

Mr47

2,655
1
19
25

I think the issue is the moving of the failed job back into the live job table rather than PKs. – Will A May 18 '11 at 06:50
I do use the original live ID, but as re-running is allowed, you could end up with multiple history entries for the same job (this may not be 100% logical in this case, but it is the standard way to do it in the project). – Christian Rygg May 18 '11 at 06:54
Wouldn't the timestamp account for this? – Mr47 May 18 '11 at 06:57
I'm not sure if we're talking past each other here - but that's exactly why I have the time stamp...? – Christian Rygg May 18 '11 at 06:59

Tao · Answer 3 · 2011-05-18T07:04:25.987

Assuming that the Queue's Identity column is the one assigning "Job IDs", I would think the simplest solution would be to add a new "OriginalJobID" nullable column, potentially with FK pointing to the history table. Then when you are rerunning a job, allow it to get a new ID as it is added to the queue, but have it keep a reference to the original job in this new column.

To answer "or should this technique just be used for batch inserting data once in a blue moon", I would say yes, definitely, that's exactly what it's for.

Oops, @Damien_The_Unbeliever is right, I'd forgotten that the IDENTITY_INSERT setting is per connection. It would be complicated to get yourself into real trouble with the identity insert approach (would take something like MARS I guess, or bad error-handling). Nonetheless, I think trying to reuse IDs is a mistake!

I have thought of this too, but if the IDENTITY_INSERT option is usable, I find this to be a much cleaner and neater way of doing it... — Christian Rygg, May 18 '11 at 07:04

score 0 · Answer 4 · answered May 18 '11 at 11:29

I can see a potential performance issue when reusing identity values and that is if the identity column is indexed by a clustered index.

A strict growing number will cause inserted rows to always be added last in the clustered index and no page splits will occur.

If you start to insert reused numbers then you may cause page splits during those insertions. If that is a problem is up to your domain.

Reinserting rows with identity columns

4 Answers4