3

I have a pipeline where an external system enqueues loads of messages every hour, maybe in the size of 1.000. If a specific action is done, this could be 50.000. I have a webjob, which consumes this queue, and feeds them to an SQL server.

The Webjob will concurrently consume 8 messages, and as soon as a response has been received, will consume the next message. My issue is that this effectively eats all the resources the SQL server has. I am only interested that the queue is eventually emptied, but there's no time constraint (other than it should be able to do it before the next batch comes in).

It is possible to se the NextVisibilityTime - but that essentialy does nothing (unless I make some clever trick on insertion time, but that just seems very hacky). I tried implementing a Polly retry-logic like below:

 var _retryPolicy = Policy
                .HandleResult<HttpResponseMessage>(r => !r.IsSuccessStatusCode)
                .WaitAndRetryAsync(
                    4,
                    retryAttempt => TimeSpan.FromSeconds(Math.Pow(1, retryAttempt)),
                    (ex, timespan, context) =>
                    {
                        Logger.Log("Queue retry increased");
                    });

The basic idea was that if the server could not keep up, it would return an error code. If this happened, i would take a 'chill-pill' and wait a bit before retrying (in the above example the retry would be instant, 1s, 2s, 4s, 8s, 16s). This, however, seems to very easily break the max-execution time for a message which is 30s. If this happens, Polly will not get to the point where it re throws, and hence the message will not be added to the poison queue.

Effectively, I'm looking for a way to spread out my consumption of messages intelligently. If it was a function I could change how many concurrent reads I could have, but setting it to simply only consume 1 or 2 messages seems to be a naive approach as well.

Any ideas are appreciated.

Edit: Code for processing the message:

var response = await client.PostAsync(endpoint, new StringContent(payload, Encoding.UTF8, "application/json"));

Basically the endpoint is an HTTP post request to an API on top of the SQL server, which parse the payload of the message (JSON data structure) to a POCO and inserts it into the database. This is subject to change, but it was made during .NET Core 1.0 and there was dependency reference issues between some of the libraries. The intended idea is to not having to do it over an HTTP request.

If the response is not a 200, the polly retry logic will kick in.

Dennis
  • 909
  • 4
  • 13
  • 30
  • Please edit your question and share the code for processing the message as well. – Gaurav Mantri Nov 09 '17 at 07:26
  • @GauravMantri added it – Dennis Nov 09 '17 at 07:30
  • One alternative would be to have the throttling logic in the API itself. In the future, if your API receives such heavy traffic, if not just from the webjob, you will end up with the same problem. You can probably have some form of batching implemented in the API (storing the batches in a cache like Redis), which will do bulk updates to the DB and the batch size can be configured based on peak loads. – ThePretendProgrammer Nov 10 '17 at 16:46
  • Did you ever figure this out? I have a similar problem in which my WebJob is processing up to 15 messages at a time, and this is maxing out the DTU's on my Azure SQL Database. – Shawn de Wet Nov 20 '18 at 10:35
  • @ShawndeWet Some of the libraries has matured and we have been able to deal with this in another way to avoid it all together - but basically we had to change the architecture of the services to make this happen in a better way. Durable functions are great for some of this. But the basic change was to change the basic idea of serving data to a service, to be the service consuming the data. – Dennis Nov 29 '18 at 09:10

0 Answers0