0

Our infra for web application looks like this

Nodejs Web application -> GraphQL + Nodejs as middleware (BE for FE) -> Lot's of BE services in ROR -> DB/ES etc etc

We have witness the whole middleware layer of GrpahQL+Nodejs gets latent whenever any of the multiple crucial BE service gets latent and request queuing starts happening. When we tried to compare it with number of requests during the period it got latent it was <1k request which is much lower than the claimed 10k concurrent request handling of nodejs. Looking for pointers to debug this issue further. Analysis done so far from our end:

  • As per Datadog and other APM which are used to to monitor system health, CPU and memory usage have shown no abnormal behaviour when the servers gets latent
  • We are using various request tracking methods from top most layer to last layer, and it is confirmed that request queuing is happening on this middleware layer only.
Nitin Agrawal
  • 1,341
  • 1
  • 10
  • 19
  • @Evert, not sure why you felt it is "ridiculously complex setup", if you have been part of any large scale web application you will notice this much of complexity will always exists as it's pretty standard practice (due to the simple reason of making system decoupled). Regarding your suggestion of checking starting from top and going downwards, we have been doing this via DataDog APM and found CPU, Memory, IO looks remains much lower than thresholds. – Nitin Agrawal Feb 24 '21 at 07:21
  • @Evert thanks for your opinions, however instead of explaining how good/bad our architecture or team is, I would prefer to answer any doubts from my question. Looking for response from other community members having more in depth understanding of Nodejs. – Nitin Agrawal Feb 24 '21 at 10:07
  • Very well. What have you tried in terms of finding bottlenecks so far? There's a lot of material out there regarding general approaches towards identifying bottlenecks. I assume you looked, so maybe you can share some findings (even if they came out negative) – Evert Feb 24 '21 at 17:06
  • Also can you further describe the 'queuing' behavior? If you can, update your question rather than comment here. – Evert Feb 24 '21 at 17:10

0 Answers0