0

I want to implement an incremental load process using SQL Server Change Data Capture. Every example I find takes the "happy path."

In other words, they assume that the CDC history exceeds the time since the last successful incremental load.

Suppose we leave the cleanup job with the default of 3 days, and for some reason our load hasn't successfully completed for longer than that. I need to check for this and run a full extract instead.

I'm logging the successful execution datetime in SQL Server tables. So, if I compare the last successful date to the earliest record in the cdc.lsn_time_mapping table, will this accomplish my task?

Basically something like:

Select @LastSuccessfulDate from audit.... 

Select @MinCdCDate = min(tran_begin_time) from cdc.lsn_time_mapping

if @MinCdCDate > @LastSuccessfulDate then 'Full' else 'Incremental'

Should this work? Is there a better way to accomplish it?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
DavidStein
  • 3,149
  • 18
  • 41
  • 62

1 Answers1

1

I would always stay in the "log domain" not the "time domain" when working directly with CDC. So track the last LSN of the last run and compare it against sys.fn_cdc_get_min_lsn every time you syncronize.

So if you last synchronized at lsn=100, and the min_lsn=110, then you've got a gap of 10 missing log records.

But this is only one of many scenarios that will require you to reinitialize the replication with a full sync, so you should also have an input parameter or somesuch to force a full sync.

David Browne - Microsoft
  • 80,331
  • 6
  • 39
  • 67
  • Thanks for your response, but even if I start keeping track of the last successful LSN (instead of Date) using sys.fn_cdc_get_min_lsn won't tell me if I've missed my incremental window anyway, right? – DavidStein May 13 '22 at 19:24
  • I'm missing something here. The max LSN value is returned for the entire database using sys.fn_cdc_get_max_lsn(). However, sys.fn_cdc_get_min_lsn returns the minimum valid LSN for a specific table. So, if my last "Max LSN" was 100, and there are no new changes in a specific table, this would also cause a full sync, right? I think what I need to do is determine the minimum valid LSN for the entire database and compare that instead. – DavidStein May 15 '22 at 17:03
  • In that case the min_lsn would be < 100, so you would request all the changes from 101 through the current max_lsn, say 200, which would contain no rows. – David Browne - Microsoft May 15 '22 at 18:51
  • Assuming that I'm storing the last successful LSN value in a separate logging/control database, how do I retrieve that into an ADF pipeline and convert it back into binary(10) for a copy activity? ADF parameters only support string, int, float, etc... ADF variables have even fewer options. – DavidStein May 19 '22 at 13:29
  • You should be able to store and pass the value as a hex string, eg “0xA428B3…”. – David Browne - Microsoft May 19 '22 at 14:05
  • I would have thought so, but the following testing does not bear that out. I can't post the code sample in this comment. However, when I get the min lsn and paste it into something like this I get the 313 error message when using this binary(10) parameter : SET @converted_from_lsn = convert(binary(10), '0x0000004F00000A280003' ) – DavidStein May 19 '22 at 14:50
  • 1
    Construct the call like `exec foo 0x0000004F00000A280003` not `exec foo '0x0000004F00000A280003'`. Or first convert from varchar like this `SET @converted_from_lsn = convert(binary(10), '0x0000004F00000A280003', 1 )`. The conversion style is required. – David Browne - Microsoft May 19 '22 at 18:35