1

When listening for change feed changes in a .NET application, during spikes of high usage on a collection, it is possible that the requests to CosmosDB get throttled (HTTP response 429).

There are 3 flavours of the CosmosDB change feed client for .NET:

  1. The original: Microsoft.Azure.DocumentDB.ChangeFeedProcessor v1.x.

  2. Its v2.x successor is API-compatible but has sweeping code changes.

  3. The current implementation is part of Microsoft.Azure.Cosmos v3

Which of these versions (if any) supports a way to plug into the error handling (to emit custom telemetry about it)?

The original library internally implements retries on HTTP 429 and I wasn't able to find a way to hook into the retry mechanism.

Cristian Diaconescu
  • 34,633
  • 32
  • 143
  • 233
  • Why do you want to hook into the 429 retries? Just to get telemetry? All the versions you described will retry and handle 429s automatically, even in the scenario that you are constantly throttled, they will keep retrying. – Matias Quaranta Jan 27 '20 at 16:30
  • Yes, telemetry (and alerting feeding on that). Also, transparency. I know about the unified Azure metrics, they're not a good fit for our use case. We have tight SLOs around latency and throttling can throw in a big spanner. – Cristian Diaconescu Jan 27 '20 at 19:33
  • I looked a bit at the decompiled code and the used internal `BackoffRetryUtility.ExecuteAsync(...)` methods do take an `Action` parameter that is called after temporary failures, and would be exactly what I need - but again, it's not exposed anywhere. – Cristian Diaconescu Jan 27 '20 at 19:51

1 Answers1

0

There is no SLA or guarantees on those libraries in terms of latency or speed. Because the latency or speed is also affected by the code you use to process those changes and the infrastructure where you deploy it (and things like location and network latency).

The only guarantee is the "at least once" delivery of the changes as long as the container is available and has enough provisioned throughput to serve the requests.

This means that the Change Feed Processor will keep retrying always, trying to fulfill the guarantee of deliver the changes. If the container is throttled, it will keep trying until it succeeds. There is no guarantee of how long it will take for you to get the changes.

The Cosmos DB service when it returns a 429 (Throttle) it returns also a header that indicates when the request needs to be retried, so the Change Feed Processor will honor this wait time and retry.

If the request is to have Telemetry, that is available on the Azure Diagnostics, you can see the throttles there and even filter by User Agent to identify which application is receiving them.

Matias Quaranta
  • 13,907
  • 1
  • 22
  • 47