How to update the origin table in the CDC workflow (via SSIS)?

Question

I have a CDC process setup, whereby TableA's additional rows (or updates) are automatically picked up by an ETL and put into a TableB

TableA >>CDC>> TableB

The CDC works fine, except I want to update the first table once the CDC process is finished. I want to update the table by populating it with the "extraction date". So my tableA has, lets say: Name, Age, OtherInfo, ExtractionDate. CDC is setup on Name,Age and OtherInfo columns (extractionDate column is excluded for obvious reasons). Then, once CDC is performed on TableA and it's taken to TableB, I'd like to populate TableA's "extractionDate" with the current date. However, given I do not know which rows are being moved, I am having difficulty populating the column. Specifically, how can I make a "selective" where clause to select the "changed" rows, when that's only known to SSIS.

score 0 · Answer 1 · answered Jun 09 '15 at 16:30

In the Table A database there are system tables that were created as part enabling CDC. You should be able to easily find the table associated with Table A. This is where MSSQL keeps track of all the changes.

The __$start_lsn is a timestamp of when the change was made and your SSIS imports use this value to bring across a range of changes. The lsn_time_mapping lets you look up the timestamp so it easier to understand.

In my processing I store the start and end lsn values so I know what was brought across with each SSIS run. I could then use these lsn values to go back to this CDC source table and see all the changes that MSSQL has tracked during that time-span.

Keep in mind that the CDC system tables are automatically cleaned out every few days - so you wouldn't be able to applyt this logic historically - only for recent imports.

How to update the origin table in the CDC workflow (via SSIS)?

1 Answers1