1

We have a production scenario with users invoking expensive NLP functions running for short periods of time (say 30s). Because of the high load and intermittent usage, we're looking into Lambda function deployment. However - our packages are big.

I'm trying to fit AllenNLP in a lambda function, which in turn depends on pytorch, scipy, spacy and numpy and a few other libs.

What I've tried

Following recommendations made here and the example here, tests and additional files are removed. I also use a non-cuda version of Pytorch which gets its' size down. I can package an AllenNLP deployment down to about 512mb. Currently, this is still too big for AWS Lambda.

Possible fixes?

I'm wondering if anyone of has experience with one of the following potential pathways:

  1. Cutting PyTorch out of AllenNLP. Without Pytorch, we're in reach of getting it to 250mb. We only need to load archived models in production, but that does seem to use some of the PyTorch infrastructure. Maybe there are alternatives?

  2. Invoking PyTorch in (a fork of) AllenNLP as a second lambda function.

  3. Using S3 to deliver some of the dependencies: SIMlinking some of the larger .so files and serving them from an S3 bucket might help. This does create an additional problem: the Semnatic Role Labelling we're using from AllenNLP also requires some language models of around 500mb, for which the ephemeral storage could be used - but maybe these can be streamed directly into RAM from S3?

Maybe i'm missing an easy solution. Any direction or experiences would be much appreciated!

T. Altena
  • 752
  • 4
  • 15

1 Answers1

2

You could deploy your models to SageMaker inside of AWS, and run Lambda -> Sagemaker to avoid having to load up very large functions inside of a Lambda.

Architecture explained here - https://aws.amazon.com/blogs/machine-learning/call-an-amazon-sagemaker-model-endpoint-using-amazon-api-gateway-and-aws-lambda/

Joseph Lane
  • 181
  • 1
  • 5