Is there a way to determine the order in which the rows were committed into an append-only table? Suppose this is the schema:
CREATE TABLE [Operation] (
[Id] INT PRIMARY KEY IDENTITY,
[OperationId] UNIQUEIDENTIFIER NOT NULL DEFAULT(NEWID())
);
A service application will insert a row into this table on each call that it gets from clients where some other tables are also modified (all wrapped in a single transaction). [OperationId]
GUIDs can be generated by either the client and passed in as a parameter or by default on the server (values are randomized, no sequential GUIDs or something), it functions as a correlation ID and/or a retry token.
The requirement is: given an [OperationId]
the service needs to find all the operations that happened (were committed) after it and also return the last operation. (Clients will save the last operation's Id and use it next time to get the new delta.)
The interesting scenario here looks something like this:
- Start and Commit
OP_1
- Start
OP_2
- Start and Commit
OP_3
GetNewOps(LastOp=OP_1)
returns[OP_3]
- Note
OP_2
is not committed yet. - Client can infer that
OP_3
is the last operation.
- Note
- Commit
OP_2
- Start and Commit
OP_4
GetNewOps(LastOp=OP_3)
should return[OP_2, OP_4]
.- A naive implementation sorting by
[Id]
would only return[OP_4]
, andOP_2
is lost since the next call would beGetNewOps(LastOp=OP_4)
.
- A naive implementation sorting by
A transaction will modify other tables besides [Operation]
and GetNewOps(@LastOpId)
will read those other tables as well. The implication is that once a transaction commits we can't filter out its changes with something like WHERE [Operation].[IsCompleted]=1
(where [IsCompleted]
flag is set at the end of the transaction or something) because we don't know the closure of all affected tables.
I believe the only definitive way to get the commit order is by the parent transactions' commit LSN or log block number (i.e. sort by when COMMIT TRANSACTION
was physically written to the DB log).
Getting LSN from sys.dm_tran_database_transactions
ALTER TABLE [Operation] ADD [TransactionId] BIGINT DEFAULT(CURRENT_TRANSACTION_ID())
and then just before commit query forsys.dm_tran_database_transactions.database_transaction_last_lsn
, store that in another[Operation]
column and use that for ordering. It's not clear how reliable or deterministic this is in terms of when data is flushed to disk to generate the LSNs. There is still room for a race condition between readingdatabase_transaction_last_lsn
andCOMMIT TRANSACTION
however.
There is the sys.dm_tran_database_transactions.database_transaction_commit_lsn
column which sounds perfect, but of course it's NULL
for active transactions and committed transactions disappear from that view - effectively making that column unusable.
Change Tracking
Change Tracking seems promising, although exact semantics of its various columns are still remain murky to me. I think sys.dm_tran_commit_table.commit_lbn
is what I need however the best join key I could come up with (tx.commit_ts = c.sys_change_version
) seems very weak:
SELECT *
FROM [Operation] AS o
LEFT JOIN CHANGETABLE(CHANGES [Operation], NULL) AS c ON c.[Id] = o.[Id]
LEFT JOIN sys.dm_tran_commit_table AS tx ON tx.commit_ts = c.sys_change_version
WHERE c.sys_change_operation = 'I'
xdes_id
appears to be the right lookup key but I can't figure out a way to obtain it (it's different than sys.dm_tran_*.transaction_id
/CURRENT_TRANSACTION_ID()
).
One more downside is this approach requires two separate transactions (to generate the actual CHANGETABLE
and sys.dm_tran_commit_table
entries).
(I didn't really look at Change Data Capture (CDC) since it's not supported in Azure SQL but those tables do contain LSN information for particular rows.)