3

I'm having a little trouble following the article on how to use the CDC Control Task. Specifically, I seem to be unable to process the initial load in such a way that the subsequent incremental load is seamless (that is, no gap and no overlap) with the initial load. Unfortunately, I don't have the luxury of a quiesced database (i.e. there will be active changes while I'm doing the initial load). Here's what I've tried:

In all cases, my incremental load is simple: a CDC control task with the operation set as "Get processing range", a data flow task what has within it a CDC source and an ADO.NET destination, and another CDC control task whose operation is "mark range processed".

For the initial load, I've tried the following two scenarios:

A CDC control task in which the operation is set to "Mark CDC start", using a database snapshot that I created specifically for this task. The only other task is a data flow task that has within it an ADO.NET source that reads from the change table directly and an ADO.NET destination. In this scenario, the initial load runs fine but the subsequent incremental load fails saying that the starting LSN for the processing range is greater than the ending LSN.

The other initial load that I've tried has a CDC control task whose operation is set to "Mark initial load start", the same data flow as above (but this time, out of the live database instead of a database snapshot), and another CDC control task whose operation is "Mark initial load end". In this scenario, I get duplicate CDC records processed when I run the incremental load.

What am I missing?

Ben Thul
  • 31,080
  • 4
  • 45
  • 68

1 Answers1

0

This page states that

when processing changes, care should be taken when processing changes made in parallel to the initial load as some of the processed changes are already seen in the initial load (for example, an Insert change may fail with a duplicate key error because the inserted row was read by the initial load process).

Ben
  • 1