I'm deploying a serverless NLP app, made using BERT. I'm currently using Serverless Framework and AWS ECR to overcome AWS Lambda deployment package limit of 250 MB (PyTorch already occupies more than that space).
I'm quite happy with this solution as it allows me to simply dockerize my app, upload it to ECR and worry about nothing else.
One doubt I have is where should I store the models. My app uses 3 different saved models, each with a size of 422 MB. I have two options:
Copy my models in the docker image itself.
- Pros: If I retrain my model it will be automatically updated when I redeploy the app and I don't have to use AWS SDK to load objects from S3
- Cons: Docker image size is very large
Store my models in S3:
- Pros: Image size is smaller than the other solution (1+ GB vs 3+ GB)
- Cons: If I retrain my models I then need to manually update them on S3, as they are decoupled from the app deployment pipeline. Also I need to load them from S3 using AWS SDK (probably adding some overhead?).
So my question ultimately is: of the two solutions, which is the best practice? Why, why not? Is there even a best practice at all or is it based on preferences / need?