I'm using Polly.Contrib for HttpClient retries.
var delay = Backoff.DecorrelatedJitterBackoffV2(
medianFirstRetryDelay: TimeSpan.FromSeconds(1),
retryCount: 4,
fastFirst: true);
If I want the maximum time waiting to be ~32 seconds median (could be much higher because of randomness in the jitter, which is why I say median). I carefully read the docs which use this for maximum median wait for this API: "between 0 and f * 2^(t+1)".
Here f=1 and t=4, which comes out to max=32. But the WaitAndRetry says that doesn't include the failFast 1st retry in the count? So if I want max wait to be ~32 sec with failFast then is retryCount 4 or 5?
Update
Environment
- I'm in kubernetes using microservices, using HTTP.
- There are some single-instance workloads that manage state. If hit OOM they could be down while restarting.
- 12-30 sec if scheduled on node that doesn't have that container image to download it and restart
- 5 sec restart if container image is ready on the k8s node
Goals by priority
- Retry the HTTP op on restarted workload as quickly as possible but without spamming 1 per second.
- I'd also like to use
firstFast: true
, which would be convenient for example SQL deadlock (UPDATE or DELETE on busy table). - If possible use same retry strategy all the time, including for both startup and normal calls. There are some workloads that have multiple instances (k8s
replicas
3-8) that have to support stop/pause/start mid-day. Hence jitter, esp if 1st attempt result is a timeout during mid-day start. - Keep it simple and understandable and so less experienced devs and non-C# devs can reason about the retry call timing.
I can tweak parameters to meet all the goals except the last one. There's just too much variation for me to reason about when retry might happen after a failure, and don't think I can explain this behavior to non-C# devs.