7

I am having the following setup: a web api running in a couple of linux containers (built on aspnetcore 2.0.5) behind a application loadbalancer (AWS)

A client that is making request to this api using HttpClient. The client is running the calls on multiple tasks in parallel. If the number of parallel tasks increases, the api starts throwing exceptions with the message "Request timed out."

The call stack is:

at Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.PipeCompletion.ThrowFailed() at Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.Pipe.GetResult(ReadResult& result) at Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.Pipe.Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.IReadableBufferAwaiter.GetResult() at Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.ReadableBufferAwaitable.GetResult() at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.MessageBody.d__24.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Frame`1.d__2.MoveNext()

What can cause this error, who is setting a timeout, how can I investigate this further?

Edit: If I tried fiddler to capture the failing request, the error is not thrown anymore (maybe fiddler is opening the connections with a lesser degree of parallelism).

mslliviu
  • 1,098
  • 2
  • 12
  • 30
  • have you tried to increase `httpClient.TimeOut`? Also, do use have this problem with a specific endpoint or for all kind of requests? – Set Mar 08 '18 at 10:53
  • I've tried different settings on httpClient.Timeout (and also combinations of closing or keeping connection opened) but didn't made any difference. The method that fails is doing some lengthy filesystem operations. I created another one that just waits (await Task.Delay() for some time and this one doesn't throw the same error. – mslliviu Mar 08 '18 at 11:43
  • How long do these filesystem operations take? If it's more than a few seconds, this is a design flaw. Lengthy operations should never be handled by your web server, you should kick off some type of background task -- drop a flag in a database, put a message on a queue, kick off a web job (since you're on Azure), etc. If the duration is under 5 minutes, personally I'd set up a queue that triggers a Function and take advantage of the dirt-cheap auto-scaling consumption plan for Function Apps. – McGuireV10 Mar 08 '18 at 12:35
  • Oops, I guess AWS is Amazon, not Azure. Still, the basic concept applies. – McGuireV10 Mar 08 '18 at 12:36
  • It's Amazon with operations against a EFS file system, and on this FS 1 operation takes normally around 30ms, but when a lot of requests are coming in, the time increases a lot going to seconds/request. There are workarounds, but I'm trying to understand the time-out – mslliviu Mar 08 '18 at 13:15
  • Ah ok. Have you confirmed all the calls from the client actually reach the host? What kind of client issues the calls? I'm wondering if maybe you're hitting a concurrent connection limit (but probably not if the client is .NET Core, the new default in Core is `int.MaxValue`). – McGuireV10 Mar 08 '18 at 14:00
  • It would be very helpful to see the entire error and not just the call stack. Where is the error being reported? is it on the client? – Alon Catz Mar 08 '18 at 14:18
  • @McGuireV10 it seems that all requests are reaching the host, the calls are made from a netcoreapp using a httpclient shared across requests. I don't think it is a connection limit. if i use a simple controller action that only waits I managed to run 10x more requests – mslliviu Mar 08 '18 at 17:10
  • @AlonCatz the only message in the error is "Request timed out" + mentioned stack. This is logged by me on server side, and then returned to the client as a 500 response. – mslliviu Mar 08 '18 at 17:12

1 Answers1

0

One possible cause is due to I/O bound operation, you are blocking asp.net threads. From the problem description it seems that your server is handling requests upto certain point. Beyond that it is not able to do that.

I think if you use Async/Await correctly, then you can serve more no of concurrent requests than your current limit.

Also check that, by configuration if you can increase no of Asp.Net threads. Traditional Asp.Net supports this. Not sure if Asp.Net Core has this facility or not.

Finally, since you are running in cloud, check the feasibility of upgrading your server configuration. May be upgraded configuration will help you to server more no of requests. Obviously this should be the last choice. First focus on improving the performance your application with the existing hardware.

parag
  • 2,483
  • 3
  • 20
  • 34
  • I'm using asycn/await pattern throughout the entire api. I tried playing around with the threadcount (now this is set in libuv options) but no difference. I will upgrade the machines to something more powerfull to see what impact will this have. I'll come back with details. – mslliviu Mar 08 '18 at 17:15
  • It seems it was something related to number of threads. Just upgrading the server show no improvement. I managed to improve the performance by setting ThreadPool.MinThreads to a much higher value (800 in my case), and upgrading also the "nofile" limit on the cotainer (being linux). I;m not sure, I hope that this was the issue behind the Request timeout message. – mslliviu Mar 15 '18 at 08:07
  • Min threads bad https://blogs.msdn.microsoft.com/vancem/2018/10/16/diagnosing-net-core-threadpool-starvation-with-perfview-why-my-service-is-not-saturating-all-cores-or-seems-to-stall/ are you solve this problem? – xSx Nov 07 '19 at 06:39