4

We have deployed a model prediction service in production that is using FastAPI and unfortunately, some of the requests are failing due to a 10s timeout. In terms of concurrent requests, we typically only load about 2/3 requests per second, so I wouldn't think that would be too much strain on FastAPI. The first thing we tried to do is isolate the FastAPI framework from the model itself, and when we performed some tracing, we noticed that a lot of time (6 seconds) was spent on this segment: starlette.exceptions:ExceptionMiddleware.__call__.

The gunicorn configuration we are using didn't seem to help either:

"""gunicorn server configuration."""
import os
​
threads = 2
workers = 4
timeout = 60
keepalive = 1800
graceful_timeout = 1800
bind = f":{os.environ.get('PORT', '80')}"
worker_class = "uvicorn.workers.UvicornWorker"

Would really appreciate some guidance on what the above segment implies and what is causing timeout issues for some requests under a not too strenuous load.

enter image description here

enter image description here

Riley Hun
  • 2,541
  • 5
  • 31
  • 77
  • Hi Riley, I'm facing the same issue now and wondering if you had this issue solved? Thanks in advance – MHL Apr 01 '22 at 03:09
  • Hello Leo, it only becomes an issue when using large NLP models. We solved the issue by using GPU instead of CPU. Additionally, as indicated below, celery and redis is meant for these types of tasks with long runtimes – Riley Hun Apr 02 '22 at 07:48
  • Thanks for response, glad to hear it solved by switching to GPU, just wondering what really is `starlette.exceptions:ExceptionMiddleware.__call__` doing, is that actually a wrapper of the application code? – MHL Apr 04 '22 at 01:19

1 Answers1

1

guidance on what the above segment implies

here you have the official gunicorn config file with lot of explainations included.

since you use gunicorn to manager uvicorn workers, forcing your timeout to 60 sec should work just fine for lnog running tasks (you should think about using a asynchronous task queue or job queue like celery)

but what is returning your route ? first thing would be to see the error thrown by your api

starlette.exceptions:ExceptionMiddleware.call

Since you have expanded the list you can see that what take the most time (as expected) is not fastapi nor starlette but you function in app.api.routes.predictions.

so I wouldn't think that would be too much strain on FastAPI

it is not too much strain of fastapi since it is not involved in your request treatment. Remember that fastapi is "just" a framework so when your function take time it's your function/development that's at fault.

here it can be one or a combinaison of thoses things that cause long runing tasks:

  • sync route
  • blocking I/O function or treatment in your route function
  • prediction algo that take a lot of time (too much maybe)
  • bad worker class configuration for your type of treatment

When you do AI or nlp stuff often if take a lot of treatment time, regarding the integration of such models in api you use a task queue like celery. If your api is not at fault and your route not returning an error, just that it take a lot of time you should take a look at implementing task queue's.

Bastien B
  • 1,018
  • 8
  • 25
  • This is super helpful - thank you! What is `starlette.exceptions:ExceptionMiddleware.__call__` though? That is also taking 6 seconds. Looks like an application-level issue. – Riley Hun Dec 03 '21 at 18:30
  • @Riley From the starlette doc => https://www.starlette.io/middleware/: ExceptionMiddleware - Adds exception handlers, so that particular types of expected exception cases can be associated with handler functions. For example raising HTTPException(status_code=404) within an endpoint will end up rendering a custom 404 page. – Bastien B Dec 04 '21 at 13:23