0

I'm part of a Django project where we make many API calls to external services using the requests and zeep libraries. These calls are done sequentially and our API heavily relies on their immediate responses.

However, when our server experiences high traffic, it stops responding. It seems like all our workers become occupied, waiting for these external network calls to finish, leaving no workers available to process new incoming requests.

The project is large and converting the entire project to asynchronous isn't currently feasible. Furthermore, because of our need for immediate responses from these API calls, using a task queue system like Celery is not a solution.

As we anticipate even more traffic in the future, I'm looking for a solution that can efficiently handle these blocking network calls without blocking our server or requiring a complete rewrite to asynchronous.

I increased the number of gunicorn workers using 2*CPU+ 1 formula, but that might not be enough in the future.

Azikdev
  • 11
  • 1
  • 2
  • Because you are depending on an external service, it's all out of your hands. It sounds like you built this all around immediate responses, which is never a guarantee, and then chained multiple of these together, which was fundementally the wrong thing to do. I don't think there is a band aid solution for this type of fundamental problem.. It should have been threaded/async from the start. – Nealium Jul 26 '23 at 21:00

1 Answers1

0

I have two suggestions in mind, but neither will completely eliminate the blocking nature of synchronous I/O. If your traffic continues to increase, you may ultimately need to consider refactoring your application to use async I/O, ideally in combination with Celery.

  1. Use eventled or gevent: These libraries use lightweight pseudo-concurrent threads to handle I/O-bound tasks, such as network requests, more efficiently. They're compatible with synchronous code, so there's no need to rewrite your entire application. If you're performing the requests with Python requests library, I would suggest you to change to something like:ThreadPoolExecutor Example or grequests
  2. Caching: Cache as much as possible. This is especially useful if you're receiving a lot of requests for the same data, as it prevents the system from blocking on already known results.

While these solutions can somewhat mitigate the problem, be aware that they may require more CPU resources as the application grows. I highly recommend trying to convert some critical components to async.

JoVi
  • 351
  • 3
  • 5