0

I'm trying to find a way to 'throttle' CDC on SQL2008.

The reason being that under normal circumstances, CDC performs brilliantly, but as soon as it needs to deal with a 'large' number of rows, it starts tanking.

Typical throughput is between 1000 and 3000 rows a second. It starts to die at about 5000 rows per second.

Usually, this is not an issue, since we're using CDC to keep two databases in sync as a near real-time ETL process for statistical modelling. In the past, for bulk data moves we've had to come up with dodgy manual methods. I'm wondering if I can through a huge amount of data at it, but find a way to tell it to only do 5 transactions at a time, or otherwise force it to work through bite-sized chunks (however long that takes), rather than try and do them all at once and suffer poorly.

  • 1
    @jjj: as the question shows CDC == Change Data Capture – Mitch Wheat Feb 22 '10 at 05:44
  • More detail... cdc uses the __$start_lsn column in it's change tables to determine the changes. Normal process is to call sys.fn_get_all_changes_ and pass in the @from_lsn and @to_lsn parameters. If you use the default approach which is min(lsn) and max(lsn) you may get heaps of rows. The simple solution would be to count 5000 rows in the change table, and pass in that lsn as @to_lsn. If a transaction changes 50'000 rows, the lsn value at row 5000 is the same as the 50'000th. The pseudo throttle will be defeated. I'm wondering if anyone else, has had a similar situation... –  Feb 23 '10 at 03:12

1 Answers1

0

Please see: Tuning the Performance of Change Data Capture in SQL Server 2008

Are you sure that CDC is the right solution for what you are trying to achieve? I'm just wondering if SQL Server Change Tracking using ADO.NET Sync services might be more appropriate?

Mitch Wheat
  • 295,962
  • 43
  • 465
  • 541
  • I've read through that. Changing the values didn't affect the 'tank' point. I think this is because those values control the flow of data into the change tables. My issue is more on the flow of data out of those same tables. I'm thinking about using the __$start_lsn column, and then some kind of rowcount operation to determine how big those transactions are (each __$start_lsn should be analogous to a 'transaction' for all intents and purposes). I'll then know how many transactions, or lsn's I can process in a batch. But I'm kinda struggling with the concepts! –  Feb 22 '10 at 05:51
  • I kinda made it a bit simplistic in my question. We do a little more than merely sync. We're using CDC so we get the historical view of how the data changed over time (which is important to our statistics modelling process). –  Feb 22 '10 at 06:06
  • You might want to ask a more detailed question, and someone might have ideas. – Mitch Wheat Feb 22 '10 at 06:25