Reducing over 30 seconds cold start on AWS API Gateway + Lambda

Question

I've been facing an extremely slow cold start on Lambda Functions deployed in Docker containers together with an API Gateway.

Tech Stack:

FastAPI
Mangum (https://mangum.io/)
API Gateway
AWS Lambda

To do the deployment, I've been using AWS SAM with the following template file:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: >
  demo

Resources:
  AppFunction:
    Type: AWS::Serverless::Function
    Properties:
      Timeout: 118
      MemorySize: 3008
      CodeUri: app/
      PackageType: Image
      Events:
        ApiEvent:
          Properties:
              RestApiId:
                  Ref: FastapiExampleGateway
              Path: /{proxy+}
              Method: ANY
              Auth:
                ApiKeyRequired: true
          Type: Api
    Metadata:
      Dockerfile: Dockerfile
      DockerContext: .
      
  FastapiExampleGateway:
    Type: AWS::Serverless::Api
    Properties:
        StageName: prod
        OpenApiVersion: '3.0.0'
        # Timeout: 30
    Auth:
      ApiKeyRequired: true
      UsagePlan:
        CreateUsagePlan: PER_API
        UsagePlanName: GatewayAuthorization

Outputs:
  Api:
    Description: "API Gateway endpoint URL for Prod stage for App function"
    Value: !Sub "https://${FastapiExampleGateway}.execute-api.${AWS::Region}.amazonaws.com/Prod/"

The lambda is relatively light, with the following requirements installed:

jsonschema==4.16.0
numpy==1.23.3
pandas==1.5.0
pandas-gbq==0.17.8
fastapi==0.87.0
uvicorn==0.19.0
PyYAML==6.0
SQLAlchemy==1.4.41
pymongo==4.3.2
google-api-core==2.10.1
google-auth==2.11.0
google-auth-oauthlib==0.5.3
google-cloud-bigquery==3.3.2
google-cloud-bigquery-storage==2.16.0
google-cloud-core==2.3.2
google-crc32c==1.5.0
google-resumable-media==2.3.3
googleapis-common-protos==1.56.4
mangum==0.11.0

And the Dockerfile I'm using for the deployment is:

FROM public.ecr.aws/lambda/python:3.9

WORKDIR /code

RUN pip install pip --upgrade

COPY ./api/requirements.txt /code/api/requirements.txt

RUN pip install --no-cache-dir -r /code/api/requirements.txt

COPY ./api /code/api

EXPOSE 7777

CMD ["api.main.handler"]

ENV PYTHONPATH "${PYTHONPATH}:/code/"

Leading to a 250mb image.

On the first Lambda pull, I'm seeing this time before launch

which looks like it's a very long start before the actual lambda execution. It reaches the point where API gateway times out due to the maximum 30 second response!

Local tests using sam local start-api work fine.
I've tried increasing the lambda function RAM to higher values.

Not sure if this a problem with Mangum (wrapper for FastAPI)?

Please have a look at this: https://wa.aws.amazon.com/serv.question.PERF_1.en.html there is a section where it is mentioned that you can add `PYTHONPROFILEIMPORTTIME=1` as an env variable to profile and understand what packages impact startup time. Also, one other option is to enable provisioned concurrency so that your lambda is warm when you invoke it. — brushtakopo, Nov 18 '22 at 10:27
Seems to be taking a significant amount of time with the imports: 11:34:43 first import 11:35:08 last import and then the app starts up. Is this related with Docker container startup @AnthonyB.? — Paulo Maia, Nov 18 '22 at 11:44
yes docker will add latency. Have a look at this: https://stackoverflow.com/questions/69512271/will-the-cold-starts-of-my-aws-lambda-function-take-longer-if-i-use-an-ecr-image If you cannot avoid container then use provisionned concurrency so your lambda is already warm and ready to serve request. — brushtakopo, Nov 18 '22 at 13:01
I agree with the above: don't containerised unless you have to, explore provisioned concurrency to keep your Lambdas warm. It's also worth looking at static initialisation to avoid latency in making database connections etc. on every invocation. https://docs.aws.amazon.com/lambda/latest/operatorguide/static-initialization.html — monkee, Nov 18 '22 at 23:34
Due to the filesize of the Python packages I'm using (Tensorflow 2.0), container seems to be the best option for me, from what I've seen in other posts. Provisioned Concurrency looks nice but seems to have extra costs which will be a bit harder to predict, which is something I would not like to have. Would having EventBridge pings with a fixed frequency have similar effects? — Paulo Maia, Nov 21 '22 at 11:10
Great summary of hard technical info on what affects cold start durations in AWS Lambda over several languages and approaches: https://mikhail.io/serverless/coldstarts/aws/ — NeilG, Jan 27 '23 at 07:29
Another good survey of some of the main limitations of AWS Lambda serverless architecture in terms of latency, expense of recreating connections etc but from a Node.js point of view: https://medium.com/wearesinch/the-challenges-of-aws-lambda-in-production-fc9f14b182be — NeilG, Jan 27 '23 at 07:36
One more article just giving some quick advice on how to run a warmer to avoid cold starts and clarifying cold starts can arise due to no lambdas kept warm, but also: *concurrent lambdas* forcing new additional lambdas to cold start: https://epsagon.com/development/how-to-minimize-aws-lambda-cold-starts/ — NeilG, Jan 27 '23 at 07:40

Reducing over 30 seconds cold start on AWS API Gateway + Lambda

0 Answers0