0

I'm using Polly.Contrib for HttpClient retries.

var delay = Backoff.DecorrelatedJitterBackoffV2(
    medianFirstRetryDelay: TimeSpan.FromSeconds(1),
    retryCount: 4,
    fastFirst: true);

If I want the maximum time waiting to be ~32 seconds median (could be much higher because of randomness in the jitter, which is why I say median). I carefully read the docs which use this for maximum median wait for this API: "between 0 and f * 2^(t+1)".

Here f=1 and t=4, which comes out to max=32. But the WaitAndRetry says that doesn't include the failFast 1st retry in the count? So if I want max wait to be ~32 sec with failFast then is retryCount 4 or 5?


Update

Environment

  • I'm in kubernetes using microservices, using HTTP.
  • There are some single-instance workloads that manage state. If hit OOM they could be down while restarting.
    • 12-30 sec if scheduled on node that doesn't have that container image to download it and restart
    • 5 sec restart if container image is ready on the k8s node

Goals by priority

  1. Retry the HTTP op on restarted workload as quickly as possible but without spamming 1 per second.
  2. I'd also like to use firstFast: true, which would be convenient for example SQL deadlock (UPDATE or DELETE on busy table).
  3. If possible use same retry strategy all the time, including for both startup and normal calls. There are some workloads that have multiple instances (k8s replicas 3-8) that have to support stop/pause/start mid-day. Hence jitter, esp if 1st attempt result is a timeout during mid-day start.
  4. Keep it simple and understandable and so less experienced devs and non-C# devs can reason about the retry call timing.

I can tweak parameters to meet all the goals except the last one. There's just too much variation for me to reason about when retry might happen after a failure, and don't think I can explain this behavior to non-C# devs.

yzorg
  • 4,224
  • 3
  • 39
  • 57
  • project wiki page covering this: https://github.com/Polly-Contrib/Polly.Contrib.WaitAndRetry#new-jitter-recommendation – yzorg Feb 06 '23 at 20:22
  • I'm having hard time to understand what exactly is your question. Do you want to constraint the max time which should be spent with retries to 32 seconds? Or ? – Peter Csala Feb 07 '23 at 09:11
  • I'm asking if fastFail is counted or ignored when calculating max median wait time (max time ignoring jitter). I want the overall operation to fail in under 60 sec (assuming each op time is negligible). This is primarily a retry in case the kubernetes workload being called is being restarted. It's a single deployment, so if down it will fail fast. – yzorg Feb 07 '23 at 11:08
  • 1
    If you set the `failFirst` then it means that the first sleep duration will be 0. Please see the [related source code](https://github.com/Polly-Contrib/Polly.Contrib.WaitAndRetry/blob/master/src/Polly.Contrib.WaitAndRetry/Backoff.DecorrelatedJitterV2.cs#L50). – Peter Csala Feb 07 '23 at 12:41
  • I plan to use simple exponential for "normal" HTTP calls, only use Jitter variant during startup calls. Which means different HttpClient registrations for startup and normal/Controller use. – yzorg Feb 07 '23 at 13:46
  • It might make sense to play with the V1 version as well [`Backoff.ExponentialBackoff`](https://github.com/Polly-Contrib/Polly.Contrib.WaitAndRetry/blob/master/src/Polly.Contrib.WaitAndRetry/Backoff.Exponential.cs) – Peter Csala Feb 07 '23 at 13:59

2 Answers2

2

Quite frankly I don't understand your requirements in entirety but here are my thoughts.

DecorrelatedJitterBackoffV2

This is a specialized sleep duration provider. Depending on the provided parameters it will generate a sequence of sleep durations IEnumerable<TimeSpan>. So, it is iterable.

If you use this provider in your retry policy it will wait between the failed attempt and a new retry attempt as much as the next value from this iterable.

  • The 1st attempt fails (original action)
  • Policy evaluates whether it should trigger or not >> let's suppose it should
  • Sleeps as much as it was defined by the provider's first item
  • The 2nd attempt fails (first retry)
  • Policy evaluates whether it should trigger or not >> let's suppose it should
  • Sleeps as much as it was defined by the provider's second item
  • etc.

Time constraints

The sleep duration provider controls only the delays between two attempts. In other words it does not have any affect on how long a given attempt takes.

If you have a policy chain like this Policy.WrapAsync(retryPolicy, timeoutPolicy) then you have constrained the individual attempts (including the original action as well). So, with this in your hand you could calculate the worst case scenario: at most how many retries could be issued, how much time could each attempt take and what are the delays between two attempts.

If you would have a policy chain like this Policy.WrapAsync(timeoutPolicy, retryPolicy) then you have constrained the overall time which could be spent for retries. So, this is an overarching time constraint. With this in your hand all you can say is that in worst case when should the retry give up. But you don't know how many retry attempts could be issued during this period since you don't have an explicit upper bound on each attempt.

You can combine the two approaches and create a policy chain like this:

Policy.WrapAsync(globalTimeoutPolicy, retryPolicy, perAttemptTimeoutPolicy);

Here you would have a limit for each attempt and for all attempts as a whole as well.

Combining policies

If you have a timeout policy as an inner policy and a retry as an outer policy then you should alter your retry to trigger for timeouts as well. In case of Polly the timeout policy throws a TimeoutRejectedException not an OperationCanceledException.

So, you should add .Or<TimeoutRejectedException>() builder method call to your retry policy definition.


UPDATE #1

which parameters control WAIT TIME?

The medianFirstRetryDelay is used to calculate the next values. You can consider it as a seed for the exponential backoff function.

You can't control the max generated delay via the parameters.
But with the following simply wrapper you can:

IEnumerable<TimeSpan> GetCappedSleepDurations(TimeSpan? maxDelay = null)
{
    maxDelay ??= TimeSpan.FromSeconds(32);
    var delays = Backoff.DecorrelatedJitterBackoffV2(
        medianFirstRetryDelay: TimeSpan.FromSeconds(1),
        retryCount: 10,
        fastFirst: false);

    foreach (var delay in delays)
    {
        yield return delay < maxDelay.Value ? delay : maxDelay.Value;
    }

}

Then if your print out the results then you should see something like this:

00:00:00.7942099
00:00:01.2404539
00:00:02.0858948
00:00:01.8598678
00:00:12.2435673
00:00:08.1000294
00:00:32
00:00:32
00:00:32
00:00:32
Peter Csala
  • 17,736
  • 16
  • 35
  • 75
  • I said explicitly "time doing retries". The scenario I'm working on I know the operation itself will fail fast (under 1 sec). You can ignore the operation time. Thanks for your time, but sorry that this answer is not very relevant to my OP, which was asking which parameters control WAIT TIME. – yzorg Feb 07 '23 at 10:58
  • clarified title to say max 32 sec in **waits** – yzorg Feb 07 '23 at 11:03
  • @yzorg I've updated my answer, please read the update #1 section. – Peter Csala Feb 07 '23 at 11:16
1

I wrote a .NET Fiddle to answer this.

https://dotnetfiddle.net/QgUqq0

I see what you mean @PeterCsala. I thought I could control basic boundaries of the wait time just by varying the parameters (which seems to be implied by the docs). But the jitter is way more random than I thought. I didn't realize each wait could vary so widely, when summed the total wait for count=4 or count=5 also varies widely. The median is also way lower than I expected.

And take this example output from the .NET Fiddle:

1 sec, retryCount: 4, firstFast: True
...
    00:00:00    00:00:02.4274782    00:00:01.1342445    00:00:05.9134879
    sum: 9.48 sec

Notice 1.1 sec wait AFTER a 2.4 sec wait. I'd never expect that from an "exponential backoff" base algo. It's not rare.

I don't think I can explain this behavior to my other teammates (mostly Java and Python developers). So I think I'm going to have to use something simpler.

Fiddle Ouput

Here's the entire output of the Fiddle:

1 sec, retryCount: 4, firstFast: True
    00:00:00    00:00:01.3908135    00:00:03.6539260    00:00:05.8674163
    sum: 10.91 sec
    00:00:00    00:00:02.8356851    00:00:02.0988871    00:00:05.5788317
    sum: 10.51 sec
    00:00:00    00:00:02.2710896    00:00:01.6223447    00:00:07.2301703
    sum: 11.12 sec
    00:00:00    00:00:01.8503867    00:00:02.3001164    00:00:05.9391929
    sum: 10.09 sec
    00:00:00    00:00:01.7118342    00:00:01.1271053    00:00:08.2654298
    sum: 11.10 sec
    00:00:00    00:00:01.5934536    00:00:01.4503908    00:00:06.0499969
    sum: 9.09 sec
    00:00:00    00:00:01.8295553    00:00:02.3008588    00:00:05.7575813
    sum: 9.89 sec
    00:00:00    00:00:01.6955469    00:00:02.2361896    00:00:03.1572746
    sum: 7.09 sec
    00:00:00    00:00:01.6458597    00:00:02.5933294    00:00:05.4516284
    sum: 9.69 sec
    00:00:00    00:00:02.6751631    00:00:00.3368498    00:00:06.5234695
    sum: 9.54 sec
    00:00:00    00:00:02.7957035    00:00:01.6330546    00:00:06.2968850
    sum: 10.73 sec
    00:00:00    00:00:02.4284071    00:00:03.0204295    00:00:01.5322586
    sum: 6.98 sec
    00:00:00    00:00:02.4912384    00:00:00.6106680    00:00:07.2869721
    sum: 10.39 sec
    00:00:00    00:00:01.5209607    00:00:04.0020657    00:00:04.1196119
    sum: 9.64 sec
    00:00:00    00:00:02.5200905    00:00:00.5708792    00:00:03.3635606
    sum: 6.45 sec
    00:00:00    00:00:01.9854414    00:00:03.0925564    00:00:01.6569407
    sum: 6.73 sec
    00:00:00    00:00:01.4412447    00:00:02.8806764    00:00:05.7825008
    sum: 10.10 sec
    00:00:00    00:00:02.0367938    00:00:02.2873604    00:00:04.3724692
    sum: 8.70 sec
    00:00:00    00:00:01.5629809    00:00:01.4608352    00:00:04.3596487
    sum: 7.38 sec
    00:00:00    00:00:01.9355692    00:00:02.2350188    00:00:04.3516659
    sum: 8.52 sec
1 sec, retryCount: 4, firstFast: False
    00:00:01.3290191    00:00:01.2202047    00:00:02.5260844    00:00:04.5466469
    sum: 9.62 sec
    00:00:00.7939010    00:00:00.7185783    00:00:03.2866291    00:00:04.4188223
    sum: 9.22 sec
    00:00:00.8904751    00:00:01.5267582    00:00:01.9186966    00:00:02.7913326
    sum: 7.13 sec
    00:00:00.6996185    00:00:00.7980595    00:00:01.8574831    00:00:03.6783710
    sum: 7.03 sec
    00:00:01.0187006    00:00:00.3737117    00:00:01.9765276    00:00:07.2216291
    sum: 10.59 sec
    00:00:01.3125672    00:00:01.3634863    00:00:01.8514689    00:00:02.7068033
    sum: 7.23 sec
    00:00:00.2745151    00:00:01.4576401    00:00:02.2336140    00:00:02.8430108
    sum: 6.81 sec
    00:00:01.1477933    00:00:00.6611546    00:00:03.8388333    00:00:00.3550863
    sum: 6.00 sec
    00:00:00.4087674    00:00:02.1939583    00:00:02.0896311    00:00:03.6630476
    sum: 8.36 sec
    00:00:01.3315603    00:00:01.3381451    00:00:00.2394113    00:00:04.5502384
    sum: 7.46 sec
    00:00:00.8031900    00:00:01.2612702    00:00:02.8399373    00:00:01.6098031
    sum: 6.51 sec
    00:00:01.1330562    00:00:01.1283540    00:00:00.9933361    00:00:02.4510240
    sum: 5.71 sec
    00:00:00.8316597    00:00:01.4024509    00:00:01.3783882    00:00:04.6346268
    sum: 8.25 sec
    00:00:00.7928881    00:00:01.5069425    00:00:01.1867454    00:00:03.5611036
    sum: 7.05 sec
    00:00:01.0969670    00:00:00.3476740    00:00:01.6399054    00:00:02.6186090
    sum: 5.70 sec
    00:00:00.3568919    00:00:01.2218998    00:00:03.3102970    00:00:01.5471265
    sum: 6.44 sec
    00:00:00.3974019    00:00:01.0342039    00:00:01.4164161    00:00:04.8298397
    sum: 7.68 sec
    00:00:00.9928894    00:00:01.1151883    00:00:01.7575383    00:00:05.1898116
    sum: 9.06 sec
    00:00:00.5711161    00:00:01.4577355    00:00:01.0355913    00:00:07.8069874
    sum: 10.87 sec
    00:00:00.3197731    00:00:01.7880796    00:00:00.9016027    00:00:06.9888528
    sum: 10.00 sec
0.5 sec, retryCount: 5, firstFast: True
    00:00:00    00:00:01.3546792    00:00:00.7053145    00:00:03.1648237    00:00:02.5794683
    sum: 7.80 sec
    00:00:00    00:00:00.7510212    00:00:01.4794709    00:00:02.8552752    00:00:03.2562177
    sum: 8.34 sec
    00:00:00    00:00:01.0308662    00:00:00.4891457    00:00:02.1505252    00:00:06.7936745
    sum: 10.46 sec
    00:00:00    00:00:01.0453633    00:00:00.8946597    00:00:01.7640227    00:00:04.8035002
    sum: 8.51 sec
    00:00:00    00:00:00.8090960    00:00:00.8605115    00:00:01.2240451    00:00:04.4458000
    sum: 7.34 sec
    00:00:00    00:00:01.2711564    00:00:01.2777960    00:00:02.5999912    00:00:04.8633200
    sum: 10.01 sec
    00:00:00    00:00:01.3006120    00:00:00.3349617    00:00:03.7801364    00:00:05.2703497
    sum: 10.69 sec
    00:00:00    00:00:01.1173007    00:00:01.3774040    00:00:00.8674170    00:00:05.6837357
    sum: 9.05 sec
    00:00:00    00:00:01.0098212    00:00:00.6186374    00:00:02.1325980    00:00:02.6069504
    sum: 6.37 sec
    00:00:00    00:00:00.9025802    00:00:01.5986111    00:00:03.1227179    00:00:05.1528147
    sum: 10.78 sec
    00:00:00    00:00:01.3149453    00:00:00.6077659    00:00:01.6087266    00:00:04.1025526
    sum: 7.63 sec
    00:00:00    00:00:00.7264336    00:00:01.9088567    00:00:00.8443737    00:00:04.4497130
    sum: 7.93 sec
    00:00:00    00:00:01.1333575    00:00:01.2017326    00:00:01.0382133    00:00:08.0170088
    sum: 11.39 sec
    00:00:00    00:00:00.8767042    00:00:00.9866298    00:00:01.1425718    00:00:05.9847881
    sum: 8.99 sec
    00:00:00    00:00:00.8028736    00:00:01.1827370    00:00:02.9544043    00:00:06.3254855
    sum: 11.27 sec
    00:00:00    00:00:00.7629045    00:00:01.6695153    00:00:01.5461169    00:00:03.3028042
    sum: 7.28 sec
    00:00:00    00:00:00.9148938    00:00:01.1600169    00:00:02.5117271    00:00:01.3261893
    sum: 5.91 sec
    00:00:00    00:00:01.1267806    00:00:00.4564534    00:00:02.4212050    00:00:06.3470234
    sum: 10.35 sec
    00:00:00    00:00:01.3391868    00:00:01.0126013    00:00:00.5598949    00:00:06.9540418
    sum: 9.87 sec
    00:00:00    00:00:01.0319109    00:00:01.6532412    00:00:00.8366237    00:00:05.0481157
    sum: 8.57 sec
1 sec, retryCount: 5, firstFast: True
    00:00:00    00:00:01.7036852    00:00:01.2349855    00:00:03.1197186    00:00:16.0773452
    sum: 22.14 sec
    00:00:00    00:00:02.2658393    00:00:02.9162426    00:00:05.9138160    00:00:09.9773757
    sum: 21.07 sec
    00:00:00    00:00:01.9868814    00:00:02.0973613    00:00:01.9061254    00:00:05.9060386
    sum: 11.90 sec
    00:00:00    00:00:02.0072594    00:00:00.8320461    00:00:03.4227425    00:00:08.6058840
    sum: 14.87 sec
    00:00:00    00:00:01.4876995    00:00:01.6473985    00:00:03.8461699    00:00:09.9558171
    sum: 16.94 sec
    00:00:00    00:00:01.5546186    00:00:03.2393016    00:00:03.5368912    00:00:09.9090074
    sum: 18.24 sec
    00:00:00    00:00:02.6238426    00:00:01.6294628    00:00:06.0053214    00:00:03.7444907
    sum: 14.00 sec
    00:00:00    00:00:02.5793887    00:00:01.6679957    00:00:03.5116129    00:00:10.1809312
    sum: 17.94 sec
    00:00:00    00:00:01.3904824    00:00:02.4303553    00:00:06.7797002    00:00:02.9103902
    sum: 13.51 sec
    00:00:00    00:00:01.5879946    00:00:04.0853268    00:00:00.6255776    00:00:05.6349109
    sum: 11.93 sec
    00:00:00    00:00:01.4367477    00:00:03.8219711    00:00:02.6701373    00:00:09.3499156
    sum: 17.28 sec
    00:00:00    00:00:01.7639718    00:00:03.2156965    00:00:06.3168010    00:00:04.0324835
    sum: 15.33 sec
    00:00:00    00:00:01.5818707    00:00:03.9788198    00:00:05.0944585    00:00:10.7327818
    sum: 21.39 sec
    00:00:00    00:00:01.7912938    00:00:01.5854307    00:00:04.2299682    00:00:11.6892054
    sum: 19.30 sec
    00:00:00    00:00:02.0104886    00:00:00.8440592    00:00:05.2010443    00:00:13.9254070
    sum: 21.98 sec
    00:00:00    00:00:02.6279233    00:00:00.3518791    00:00:06.6170555    00:00:12.1033067
    sum: 21.70 sec
    00:00:00    00:00:02.1354632    00:00:01.2299710    00:00:05.2079696    00:00:08.4223971
    sum: 17.00 sec
    00:00:00    00:00:01.4818039    00:00:02.6026511    00:00:03.2391221    00:00:04.3425045
    sum: 11.67 sec
    00:00:00    00:00:01.4059338    00:00:04.1479147    00:00:04.2587746    00:00:12.9392626
    sum: 22.75 sec
    00:00:00    00:00:01.9507035    00:00:02.1151440    00:00:04.6839676    00:00:06.0393732
    sum: 14.79 sec
1 sec, retryCount: 5, firstFast: False
    00:00:00.6788230    00:00:01.4203410    00:00:02.6245586    00:00:01.7675321    00:00:05.2430371
    sum: 11.73 sec
    00:00:00.6529607    00:00:00.8373318    00:00:01.8539233    00:00:02.6559658    00:00:09.7934878
    sum: 15.79 sec
    00:00:01.0441502    00:00:00.4420462    00:00:03.0931698    00:00:03.9559396    00:00:11.7269578
    sum: 20.26 sec
    00:00:00.7728833    00:00:01.8642782    00:00:00.3419784    00:00:06.8331030    00:00:12.1200427
    sum: 21.93 sec
    00:00:00.5639535    00:00:01.5233071    00:00:01.4933406    00:00:06.4421055    00:00:04.4632977
    sum: 14.49 sec
    00:00:01.1370886    00:00:00.2474113    00:00:02.6039481    00:00:05.7760278    00:00:05.6636975
    sum: 15.43 sec
    00:00:00.6823440    00:00:01.7238784    00:00:01.1997150    00:00:05.8404974    00:00:02.8699631
    sum: 12.32 sec
    00:00:00.7725670    00:00:01.6544065    00:00:02.8142650    00:00:04.2002013    00:00:11.5602015
    sum: 21.00 sec
    00:00:00.7588586    00:00:00.9165350    00:00:02.1185922    00:00:02.1696501    00:00:11.3773388
    sum: 17.34 sec
    00:00:00.8974586    00:00:01.4088610    00:00:03.2732610    00:00:02.8008394    00:00:14.0565266
    sum: 22.44 sec
    00:00:01.2319287    00:00:00.5067809    00:00:02.4601368    00:00:06.0496388    00:00:04.1287814
    sum: 14.38 sec
    00:00:00.9695386    00:00:00.8180148    00:00:02.0015145    00:00:02.8972007    00:00:07.5788910
    sum: 14.27 sec
    00:00:00.9019472    00:00:00.8590886    00:00:01.4123911    00:00:07.3444693    00:00:01.3780976
    sum: 11.90 sec
    00:00:01.3009539    00:00:00.5818398    00:00:03.4664505    00:00:02.0787111    00:00:14.7543293
    sum: 22.18 sec
    00:00:00.4767782    00:00:01.4488689    00:00:01.1506591    00:00:07.1873676    00:00:02.1024607
    sum: 12.37 sec
    00:00:01.0843076    00:00:01.2420571    00:00:01.1056183    00:00:03.9690401    00:00:08.3664444
    sum: 15.77 sec
    00:00:01.2792085    00:00:00.4144544    00:00:01.8363079    00:00:03.9080785    00:00:06.1009469
    sum: 13.54 sec
    00:00:00.4607123    00:00:01.0653234    00:00:02.4233058    00:00:02.5853980    00:00:06.3878357
    sum: 12.92 sec
    00:00:00.9477224    00:00:00.8579657    00:00:03.8231531    00:00:04.7891651    00:00:05.8468680
    sum: 16.26 sec
    00:00:00.4714534    00:00:01.8518566    00:00:02.5178993    00:00:06.5791304    00:00:01.0214230
    sum: 12.44 sec

--- SUMMARY ---------------------
1 sec, retryCount: 4, firstFast: True median wait: 9.23
1 sec, retryCount: 4, firstFast: False median wait: 7.84
0.5 sec, retryCount: 5, firstFast: True median wait: 8.93
1 sec, retryCount: 5, firstFast: True median wait: 17.29
1 sec, retryCount: 5, firstFast: False median wait: 15.94

Last Run:   7:21:14 am
Compile:    0.015s
Execute:    0.08s
yzorg
  • 4,224
  • 3
  • 39
  • 57