0

My DB is CDC enabled and change capture was working fine until two weeks back where the CT tables are updating only once in 3 to 4 hours.

Version: SQL Server 11.0.5613

Things I tried: 1. Stopped and restarted the capture job 2. Restarted the SQL server process. 3. Experimenting with capture delay (Found that the updates are populated in CT tables only once in 4 hours in batches) 4. Tried increasing @maxtrans to 1000, but that dint really make the system catch up. It still runs once in 4 hours.

I did some research based on following outputs:

exec sys.sp_cdc_help_jobs

enter image description here

The maxtrans here refers to the number of transactions being processed per scan cycle. (Which was increased to 1000 from 500). It is different from the number of log_records being processed. If you look at the tran_count column which is the number of transaction being processed in the “sys.dm_cdc_log_scan_sessions” table you will see that it is always below 500. (session_id = 0 shows 1052 because it is a special entry denoting a total of all transactions processed since the server was re-started which was on the 7th of January.) So I am not sure if the issue is with maxtrans.

SELECT latency, empty_scan_count, * FROM sys.dm_cdc_log_scan_sessions order by end_time desc

enter image description here

From this analysis it looks like the job does nothing for 4 hours or it has nothing to do for four hours (which seems unlikely). Then process all updates until that time and then stalls for 4 hours. The only thing that coincides with this is the frequency of the transaction log backup job which runs every 4 hours between 12:00 a.m. and 11:59 p.m. I am not sure if they are related somehow. I tried to change the frequency of the job and see the impact.It did not have any.

Any pointers could be of great help.

Ben Thul
  • 31,080
  • 4
  • 45
  • 68
lostalien
  • 115
  • 1
  • 12
  • Whether it's related or not, running log backups every 4 hours seems not frequent enough. Have you tried running them more frequently to see what (if any) impact there is on your CDC performance? – Ben Thul Jan 16 '18 at 21:02
  • I tried running the logs every hour and it had no impact on the time signatures on the log scan sessions. -- It just had the same 4 hour pattern. – lostalien Jan 16 '18 at 21:15

1 Answers1

4

A secondary node was added to the availability group. This server was added, but there was no database set up on the server. When records get written to the primary, it tries to propagate it to all secondary nodes. Since it could not successfully do this, the log never officially gets hardened, which means cdc never picks it up. Cdc was constantly waiting for the transaction to finish, which is why no work is done on the capture job, and it just hangs there. When this times out, that is when records finally get recorded into the capture table, but by that time, it's been 4 hours !

lostalien
  • 115
  • 1
  • 12
  • Is this the answer to your problem? I only ask as it seems more like an additional comment to your question. If it's not the answer, please edit this info into your question, and delete this answer. – DeanOC Jan 17 '18 at 22:05
  • 1
    This is the answer to the problem and it also has root cause explained in it. – lostalien Feb 01 '18 at 00:46