4

Is there a way to determine the order in which the rows were committed into an append-only table? Suppose this is the schema:

CREATE TABLE [Operation] (
    [Id] INT PRIMARY KEY IDENTITY,
    [OperationId] UNIQUEIDENTIFIER NOT NULL DEFAULT(NEWID())
);

A service application will insert a row into this table on each call that it gets from clients where some other tables are also modified (all wrapped in a single transaction). [OperationId] GUIDs can be generated by either the client and passed in as a parameter or by default on the server (values are randomized, no sequential GUIDs or something), it functions as a correlation ID and/or a retry token.

The requirement is: given an [OperationId] the service needs to find all the operations that happened (were committed) after it and also return the last operation. (Clients will save the last operation's Id and use it next time to get the new delta.)

The interesting scenario here looks something like this:

  1. Start and Commit OP_1
  2. Start OP_2
  3. Start and Commit OP_3
  4. GetNewOps(LastOp=OP_1) returns [OP_3]
    • Note OP_2 is not committed yet.
    • Client can infer that OP_3 is the last operation.
  5. Commit OP_2
  6. Start and Commit OP_4
  7. GetNewOps(LastOp=OP_3) should return [OP_2, OP_4].
    • A naive implementation sorting by [Id] would only return [OP_4], and OP_2 is lost since the next call would be GetNewOps(LastOp=OP_4).

A transaction will modify other tables besides [Operation] and GetNewOps(@LastOpId) will read those other tables as well. The implication is that once a transaction commits we can't filter out its changes with something like WHERE [Operation].[IsCompleted]=1 (where [IsCompleted] flag is set at the end of the transaction or something) because we don't know the closure of all affected tables.

I believe the only definitive way to get the commit order is by the parent transactions' commit LSN or log block number (i.e. sort by when COMMIT TRANSACTION was physically written to the DB log).

Getting LSN from sys.dm_tran_database_transactions

  • ALTER TABLE [Operation] ADD [TransactionId] BIGINT DEFAULT(CURRENT_TRANSACTION_ID()) and then just before commit query for sys.dm_tran_database_transactions.database_transaction_last_lsn, store that in another [Operation] column and use that for ordering. It's not clear how reliable or deterministic this is in terms of when data is flushed to disk to generate the LSNs. There is still room for a race condition between reading database_transaction_last_lsn and COMMIT TRANSACTION however.

There is the sys.dm_tran_database_transactions.database_transaction_commit_lsn column which sounds perfect, but of course it's NULL for active transactions and committed transactions disappear from that view - effectively making that column unusable.

Change Tracking

Change Tracking seems promising, although exact semantics of its various columns are still remain murky to me. I think sys.dm_tran_commit_table.commit_lbn is what I need however the best join key I could come up with (tx.commit_ts = c.sys_change_version) seems very weak:

SELECT *
FROM [Operation] AS o
LEFT JOIN CHANGETABLE(CHANGES [Operation], NULL) AS c ON c.[Id] = o.[Id]
LEFT JOIN sys.dm_tran_commit_table AS tx ON tx.commit_ts = c.sys_change_version
WHERE c.sys_change_operation = 'I'

xdes_id appears to be the right lookup key but I can't figure out a way to obtain it (it's different than sys.dm_tran_*.transaction_id/CURRENT_TRANSACTION_ID()).

One more downside is this approach requires two separate transactions (to generate the actual CHANGETABLE and sys.dm_tran_commit_table entries).

(I didn't really look at Change Data Capture (CDC) since it's not supported in Azure SQL but those tables do contain LSN information for particular rows.)

Serguei
  • 2,910
  • 3
  • 24
  • 34
  • Does your code always write to `Operation` table last? – Alex Aug 01 '19 at 21:55
  • No, there are some FKs referencing `Operation.Id` for example. – Serguei Aug 01 '19 at 22:43
  • 1
    You can try adding a [`rowversion`](https://learn.microsoft.com/en-us/sql/t-sql/data-types/rowversion-transact-sql?view=sql-server-2017) column and setting it at the very end of your transaction. – Alex Aug 01 '19 at 23:26
  • 1
    @Alex, `ROWVERSION`/`TIMESTAMP` suffers from the same problem as `IDENTITY` columns in the example - it's generated at and reflects the order in which the `INSERT` statements happened, not the `COMMIT`s. – Serguei Aug 01 '19 at 23:40
  • 2
    If you're using transactions around your inserts then there should be blocking on Operation's primary key index causing the inserts to be serialized. i.e.: The commit order will be the same as the insert order. Are you seeing otherwise? – AlwaysLearning Aug 02 '19 at 00:01
  • @AlwaysLearning there's no blocking - each transaction is inserting a single row into the table and only takes an exclusive lock on the new KEY resource, the PAGE/OBJECT locks are IX and don't conflict. Yes one 'solution' would be to force a TABLOCK on `[Operation]` and avoid all concurrency, but that's really missing the point. – Serguei Aug 02 '19 at 00:19
  • Interesting. You can't add a column `[Created] [datetime/datetime2] default (getutcdate())`, either, because the time will be the be the insert time, not the commit time. – AlwaysLearning Aug 02 '19 at 00:25
  • To clarify: My suggestion was that you update the row with `ROWVERSION` just before commit. I think it is common sense that there is no way of knowing in advance in which order transactions will be committed. – Alex Aug 02 '19 at 00:25
  • @Alex, there is a serialization of transactions when they are written to the log - that's what I want to get at - and obviously that query is going to be post the original transaction. Also you can't update `ROWVERSION`/`TIMESTAMP` columns directly, the engine populates them automatically; same as getting `SYSDATETIME()` at the end it's an approximation and doesn't prevent the race condition in the OP example. – Serguei Aug 02 '19 at 00:53
  • @Serguei, transaction events are not written to the log consecutively, but as data changes occur. The last event is a tran commit event. (https://learn.microsoft.com/en-us/sql/relational-databases/logs/the-transaction-log-sql-server?view=sql-server-2017). Short of reading the log file I do not see a way to reliably know the order of commits. The second best approach, IMO, is to add ROWVERSION column together with some other column, say `IsCompleted BIT`, then update IsCompleted at the end of the operation just before final commit. Table locking (e.g. `with(updlock)`) will be needed. – Alex Aug 02 '19 at 02:08
  • "serialization of transactions" was referring to https://en.wikipedia.org/wiki/Serializability And I should clarify in my example `GetNewOps(LastKnownOp)` also reads other table data, so even ignoring where `Operation.IsCompleted=0` the transaction's changes are still visible. There is `sys.dm_tran_database_transactions.database_transaction_commit_lsn` but it's either NULL (transaction active) but committed transactions don't show up in that view. I really really want to avoid messing with `fn_dblog()` since it's undocumented. – Serguei Aug 02 '19 at 16:35
  • What do you mean by "so even ignoring where Operation.IsCompleted=0 the transaction's changes are still visible"? What tran Isolation Level are your processes using? I am a little suspicious of XY problem here. You mention that this `Operation` table is used by client apps to get the new "delta". Please update your question with a little more background info: what info this delta includes, types of "table changes" that are made by processes e.g. updates or inserts/deletes? – Alex Aug 02 '19 at 21:26
  • Transactions use SNAPSHOT isolation (but `READ COMMITTED` and `SERIALZABLE` would be equivalent I think). A transaction modifies other tables besides `Operation` and `GetNewOps()` reads those other tables too, so simply `WHERE Operation.IsCompleted=0` will ignore only part of a transaction's effect, the exact changes to other tables are complex and aren't material to the question, it's boiled down to the basics. I can't explain it any better than that. sqlfiddle doesn't support parallel sessions so couldn't setup a repro there. – Serguei Aug 05 '19 at 17:57

0 Answers0