0

If an ETL process attempts to detect data changes on system-versioned tables in SQL Server by including rows as defined by a rowversion column to be within a rowversion "delta window", e.g.:

where row_version >= @previous_etl_cycle_rowversion
  and row_version < @current_etl_cycle_rowversion

.. and the values for @previous_etl_cycle_rowversion and @current_etl_cycle_rowversion are selected from a logging table whose newest rowversion gets appended to said logging table at the start of each ETL cycle via:

insert into etl_cycle_logged_rowversion_marker (cycle_start_row_version)
select @@DBTS

... is it possible that a rowversion of a record falling within a given "delta window" (bounded by the 2 @@DBTS values) could be missed/skipped due to rowversion's behavior vis-à-vis transactional consistency? - i.e., is it possible that rowversion would be reflected on a basis of "eventual" consistency?

I'm thinking of a case where say, 1000 records are updated within a single transaction and somehow @@DBTS is "ahead" of the record's committed rowversion yet that specific version of the record is not yet readable...

(For the sake of scoping the question, please exclude any cases of deleted records or immediately consecutive updates on a given record within such a large batch transaction.)

jjt
  • 125
  • 1
  • 10

2 Answers2

1

If you make sure to avoid row versioning for the queries that read the change windows you shouldn't miss many rows. With READ COMMITTED SNAPSHOT or SNAPSHOT ISOLATION an updated but uncommitted row would not appear in your query.

But you can also miss rows that got updated after you query @@dbts. That's not such a big deal usually as they'll be in the next window. But if you have a row that is constantly updated you may miss it for a long time.

But why use rowversion? If these are temporal tables you can query the history table directly. And Change Tracking is better and easier than using rowversion, as it tracks deletes and optionally column changes. The feature was literally built for to replace the need to do this manually which:

usually involved a lot of work and frequently involved using a combination of triggers, timestamp columns, new tables to store tracking information, and custom cleanup processes

.

David Browne - Microsoft
  • 80,331
  • 6
  • 39
  • 67
  • This helps clarify and I'll look into Change Tracking as an alternative approach. After further considering the `rowversion` mechanism and testing concurrent sessions, it's clear that `rowversion` will generate the entire series of values within a multi-record batch before it becomes readable, whereas inspecting `@@DBTS` while batch is executing will yield a value **ahead** of the entire sequence of `rowversion` values within the batch - i.e., an irreconcilable race condition where the possibility of "skipping" records definitely exists! – jjt May 27 '20 at 16:16
  • Regarding usage of temporal tables and querying the history table directly, are you suggesting to use the "system-versioned" timestamp columns for predicates? If so, the problem with this is that such a column, e.g., `SysStart` corresponds to TRANS BEGIN - so again, the situation exists where these timestamps aren't yet readable until COMMIT, producing another race condition gap... – jjt May 27 '20 at 16:37
  • 1
    But without snapshot/rcsi you would block until they are committed. So you wouldn't miss the rows. Even in SNAPSHOT you could probably use SNAPHOT transactions and track the change windows with the first_snapshot_sequence_num instead of the wall clock time. – David Browne - Microsoft May 27 '20 at 16:47
  • Good points - I was simply testing with an inspect of `@@DBTS` during batch tran, forgetting that default RC will block reads during write txns. Leveraging `first_snapshot_sequence_num` sounds like the silver bullet here since we're opting to employ SNAPSHOT as far as possible within our shop. – jjt May 27 '20 at 17:12
1

Under SNAPSHOT isolation, it turns out the proper function to inspect rowversion which will ensure contiguous delta windows while not skipping rowversion values attached to long-running transactions is MIN_ACTIVE_ROWVERSION() rather than @@DBTS.

jjt
  • 125
  • 1
  • 10