My .NET Core 3.1 app uses Polly 7.1.0 retry and bulkhead policies for http resilience. The retry policy uses HandleTransientHttpError()
to catch possible HttpRequestException
.
Now http requests fired with MyClient
sometimes return an HttpRequestException
. Around half of them are caught and retried by Polly. The other half however ends up in my try-catch
-block and I have to retry them manually. This happens before the maximum number of retries is exhausted.
How did I manage to create a race condition preventing Polly from catching all exceptions? And how can I fix this?
I register the policies with the IHttpClientFactory
as follows.
public void ConfigureServices(IServiceCollection services)
{
services.AddHttpClient<MyClient>(c =>
{
c.BaseAddress = new Uri("https://my.base.url.com/");
c.Timeout = TimeSpan.FromHours(5); // Generous timeout to accomodate for retries
})
.AddPolicyHandler(GetHttpResiliencePolicy());
}
private static AsyncPolicyWrap<HttpResponseMessage> GetHttpResiliencePolicy()
{
var delay = Backoff.DecorrelatedJitterBackoffV2(medianFirstRetryDelay: TimeSpan.FromSeconds(1), retryCount: 5);
var retryPolicy = HttpPolicyExtensions
.HandleTransientHttpError() // This should catch HttpRequestException
.OrResult(msg => msg.StatusCode == HttpStatusCode.NotFound)
.WaitAndRetryAsync(
sleepDurations: delay,
onRetry: (response, delay, retryCount, context) => LogRetry(response, retryCount, context));
var throttlePolicy = Policy.BulkheadAsync<HttpResponseMessage>(maxParallelization: 50, maxQueuingActions: int.MaxValue);
return Policy.WrapAsync(retryPolicy, throttlePolicy);
}
The MyClient
that is firing the http requests looks as follows.
public async Task<TOut> PostAsync<TOut>(Uri requestUri, string jsonString)
{
try
{
using (var content = new StringContent(jsonString, Encoding.UTF8, "application/json"))
using (var response = await httpClient.PostAsync(requestUri, content)) // This throws HttpRequestException
{
// Handle response
}
}
catch (HttpRequestException ex)
{
// This should never be hit, but unfortunately is
}
}
Here is some additional information, although I'm not sure that it's relevant.
- Since the
HttpClient
is DI-registered transiently, there are 10 instances of it flying around per unit of work. - Per unit of work, the client fires ~400 http requests.
- The http requests are lenghty (5 min duration, 30 MB response)