5

I have a table with both an IDENTITY column and a DateTime column set by GetDate() like this:

CREATE TABLE [MyTable](
  [Id] [int] IDENTITY(1,1) ,
  [InsertTime] [datetime]  DEFAULT (getdate()),
  [OtherValues] [int] 
)

All the INSERTs are preformed with default values for the IDENTITY and DateTime column like this:

INSERT INTO [MyTable] ([OtherValues]) VALUES (1)

always as standalone statements outside any explicit Transaction.

I would expect that Id would be strict increasing and InsertTime also be increasing but not strict. But with heavy load we see a few instances like this:

| Id   | InsertTime              |
|------|-------------------------|
| 3740 | 2015-03-05 10:07:25.560 | 
| 3741 | 2015-03-05 10:07:25.557 |
| 3742 | 2015-03-05 10:07:25.577 |

where we have a slight drop in InsertTime.

Does anyone know how this happens and what is the "right" order of the rows?

Per Jakobsen
  • 3,767
  • 17
  • 19
  • 1
    I agree with Lasse. I think this is two races which had different winners. As such, both inserts were happening at the "same time" so far as the recorded information is concerned so there's no "right" order to be obtained from this data - pick an arbitrary rule and apply it. It won't be "right" but it won't be "wrong" either. – Damien_The_Unbeliever May 18 '15 at 12:28
  • If race conditions is somewhat linked to this I'm going out on a limb here and stating that this will be a very difficult problem to tackle. You would effectively have to limit the processing of these rows to a single-user environment, but then you have just pushed this exact problem up one level. What happens if one client has a slower link to the server than the other, the start of the request may arrive first but the end of the request may arrive after, should it be recorded first or last? Last will effectively make slow links a bottleneck. – Lasse V. Karlsen May 18 '15 at 13:37
  • Please be aware that both Damien_The_Unbeliever and @LasseV.Karlsen are using the term "race" and "race condition" to mean something very different than is typically meant when talking about "race conditions" in database systems. Race Conditions result in deadlocks or invalid data. Here the system is working correctly. – Hogan May 19 '15 at 13:42

4 Answers4

0

There can be issues with both identity columns as well as DateTime columns that could cause unexpected results, however, for determining the order of inserts I recommend going with the identity value and not the DateTime value.

While DateTime should theoretically always increase as time passes, due to the way the current time is retrieved from the system there could be a lack of precision. For all of the gritty details, here's a link that explains a little of what is going on.

With the identity column, it is typically expected to be at least ascending, even if not sequential. Be advised, however, that the identity column could also lead to unexpected results if the seed is changed, the increment changed, the server restarted, etc. Microsoft has provided a list of what is and is not guaranteed for identity values on this page.

The other thing to keep in mind is that if you're using transactions, an identity can be generated and taken by that transaction and then a subsequent transaction can take the next identify value and could theoretically end up committing its changes prior to the first transaction.

John Hodge
  • 1,645
  • 1
  • 13
  • 13
  • Downvote reason: The answer is wrong. the `Datetime` is the problem, not the `identity`. While it is true that Identity columns are not guaranteed to be sequential, they are guaranteed to be acceding (or descending if increment was defined as a negative value). the 0.003 deviation from the expected data that the OP presented is well within the accuracy declared by Microsoft of the `DateTime` data type. – Zohar Peled May 18 '15 at 11:47
0

The correct order of the records is the one described by the Identity column.
The deviation you describe is well within the acceptable range of accuracy for DateTime data type (it's 0.00333 ms).

The problem is that Datetime data type is simply not that accurate. you should use Datetime2 if your sql server version supports it (that is, 2008 or above).

datetime values are rounded to increments of .000, .003, or .007 seconds, as shown in the following table.

Check out this comparison table between date and time data types in MSDN.

Read this article for a comprehensive explanation.

Zohar Peled
  • 79,642
  • 10
  • 69
  • 121
  • 2
    So while time is moving forward, rounding errors can make it appear it go backwards? If time in one instance get rounded to .560, after that, can time really rounded down to .557? I don't buy that. I think this is simply a race condition due to indexes and similar that made one row insertion complete before another, but the race to obtain the next available identity id was completed in the opposite order. – Lasse V. Karlsen May 18 '15 at 11:58
  • @LasseV.Karlsen: As I wrote in my comment to the answer by Landscape penguin: **the 0.003 deviation from the expected data that the OP presented is well within the accuracy declared by Microsoft of the DateTime data type**. You don't have to buy that from me, I'm not selling it, it's a simple fact declared by Microsoft. Even if the inserts are performed one by one, they can still be performed in the exact same time by different users, or in a very close proximity by a fast processing computer. I stand by my answer unless a better one is presented. – Zohar Peled May 18 '15 at 12:01
  • I did not say you were wrong. I said **I** don't buy it. What you're saying is that if I first do a `GETDATE()`, then after that do another `GETDATE()` I may get the second value lower than the first. I don't buy that. I understand very well that almost certainly I will get the same value, but I still don't buy that the inaccuracy makes the values round up and down past each other like that. That DATETIME has a *storage* accuracy problem does not mean that the code that works out the value has a problem reading the actual time of the PC which is sure to be increasing. – Lasse V. Karlsen May 18 '15 at 12:06
  • but the problem is not the `GETDATE()`, it's the `DateTime`. I never said that sequential calls to `GETDATE()` can give a descending time. I don't know how Microsoft is calculating the value to store, but I have seen this happen before. My conclusion from both my personal experience and the MSDN documentation is that `DateTime` is not accurate to a single milliseconds. it's best accuracy is as described 3.33 milliseconds. – Zohar Peled May 18 '15 at 12:10
  • GETDATE() returns DATETIME. As such the rounding is already done internally. If there is a difference between DATETIME values during evaluation and expressions, vs. DATETIME storage this is not documented. Also, the table in the linked article seems to suggest that values can round both ways, but **since they have done the job of giving concrete examples and how they round** I very much doubt that sequential time will sometimes round up and sometimes round past what previous values have rounded up to. – Lasse V. Karlsen May 18 '15 at 12:15
  • The way I see it you're asking me to believe that TIME and TIME+1ms can end up with (TIME+1ms) < (TIME) when stored as a datetime. But anyway, what I believe is completely irrelevant. – Lasse V. Karlsen May 18 '15 at 12:16
  • I'm not asking you to believe anything. I'm just pointing out what I think causes the problem the OP is reporting. I may be wrong about it, but I think I'm correct. – Zohar Peled May 18 '15 at 12:31
  • @LasseV.Karlsen ZoharPeled Why not as simple as multi-thread, multi-core some thread/cores run faster between reserving the ID and retrieving the datetime and let it go at that? – Hogan May 18 '15 at 13:26
  • @Hogan This is also a possibility of what causes this. I have to say that I'm not a 100% sure that what I've written in my answer is the actual *reason* for this odd behavior, but nevertheless I stand by my suggestion to the OP to use `Datetime2` and `SysDateTime()` instead of `DateTime` and `GetDate()`. I believe that even if it is the result of a race condition and not of wrong rounding by Microsoft, it's still better to use the more accurate data type if precision matters. – Zohar Peled May 18 '15 at 13:31
  • @ZoharPeled - race condition means something different than the way you are using it here. – Hogan May 18 '15 at 13:32
  • @Hogan you are correct, I mean just race, not race condition. I was talking about the comments on the question itself by Lasse and Demian. – Zohar Peled May 18 '15 at 13:35
0

The "right" order of rows is the order of the identity column. While the primary purpose of the identity column is to provide a unique key, it is guaranteed to be unique and ascending. The datetime column does not have the accuracy to determine a correct sort order. Even if you changed to a datetime2 column, there is no requirement that the times always be ascending or unique.

They are also both "correct" from a certain point of view. The identity column correctly represents the order that the identity value was determined and the datetime column correctly represents the time that the current time was read.

Since a database server is doing a lot of multitasking/multithreading, it cannot guarantee that an entire transaction will take place before another entire transaction. Due to the way that tasks are serviced inside the database, one transaction might be started, another started and completed, and then the first one completed.

Grax32
  • 3,986
  • 1
  • 17
  • 32
  • 1
    I don't believe it's documented when the database obtains identity values versus when the database evaluates default column values. Given that both the sequences `G1, G2, I2, I1` and `I2, I1, G1, G2` give the observed results (where `Gx` is a connection calling `GETDATE()` and `Ix` is a connection obtaining an identity value), either connection may have started first, or completed first, or however else you want to define which one occurred first. And neither of these sequences requires a "time travelling" `GETDATE` as several answers seem to believe may exist. – Damien_The_Unbeliever May 18 '15 at 13:50
  • @Damien_The_Unbeliever - please make this an answer so I can up-vote it. You are clearly correct. It is the same as my comment Zohar but stated better. – Hogan May 19 '15 at 13:44
0

One would hope that the identity column is the right order, however, it may come down to how the data was inserted. If you have data in a DatTable, for example, and are doing a SQLBulkCopy into a table with an identity column and configured for the database engine to create the identity values, they will NOT necessarily be in the same order as the DataTable data. Evidently the SQLBulkCopy operation is re-ordering the data is a way that seems good to it but that doesn't match the order of the rows in the DataTable.

In this case, if you want to know the real insert order, you'll need to have the application, not the database, set the value of an identity/order field of your making. Either that or don't use BulkCopy.