3

I would like to host a model on Sagemaker using the new Serverless Inference.

I wrote my own container for inference and handler following several guides. These are the requirements:

mxnet
multi-model-server
sagemaker-inference
retrying
nltk
transformers==4.12.4
torch==1.10.0

On non-serverless endpoints, this container works perfectly well. However, with the serverless version I get the following error message when loading the model:

ERROR - /.sagemaker/mms/models/model already exists.

The error is thrown by the following subprocess

['model-archiver', '--model-name', 'model', '--handler', '/home/model-server/handler_service.py:handle', '--model-path', '/opt/ml/model', '--export-path', '/.sagemaker/mms/models', '--archive-format', 'no-archive']

So something that has to do with the model-archiver (which I guess is a process from the MMS package?).

Richard
  • 514
  • 3
  • 9

2 Answers2

0

One possibility is that the serverless sagemaker version is trying to write the model in the same place that you have already wrote it in your inference container.

Maybe review your custom inference code and don't load the model there.

Daniel Wyatt
  • 960
  • 1
  • 10
  • 29
  • Thanks for your answer. I think, sagemaker, or rather the `model-archiver` is trying to write something here, instead of loading. My model data is stored as a tar.gz in an S3 bucket. How could I load the model differently? – Richard Dec 14 '21 at 09:23
  • So with your custom inference code the model was copied into the docker container somehow ready to be used for inference. You either did this yourself in the code or maybe you used something like RegisterModel and this was done for you. I am just wondering if this is happening twice somehow. Do you know how your model gets copied into the docker container in the non-serverless architecture? – Daniel Wyatt Dec 15 '21 at 16:39
  • Thanks! I am digger deeper right now. It seems to be an issue with the multi-model server. The inference code I adapted is written using the MMS way of serving the model. I currently trying to find a way of hosting it without MMS. – Richard Dec 16 '21 at 09:18
0

So the issue really was related to hosting the model using the sagemaker inference toolkit and MMS which always uses the multi-model scenario which is not supported by serverless inference.

I ended up writing my own Flask API which actually is nearly as easy and more customizable. Ping me for details if you're interested.

Richard
  • 514
  • 3
  • 9
  • Hi Richard, I am interested in your solution since I have a similiar problem. Do you have it uploaded e.g on Github? – Buggorilla Jan 18 '22 at 10:50
  • Hi Buggorilla! I am sorry, but this is for our company. For the Flask API I basically followed this tutorial https://github.com/aws/amazon-sagemaker-examples/blob/master/advanced_functionality/scikit_bring_your_own/container/decision_trees/predictor.py Hope that helps :) Otherwise ping me again – Richard Jan 19 '22 at 12:32
  • I'm pretty sure there's a way to have MMS work with a serverless endpoint. The MMS package doesn't assume that you will be in a multi-model scenario (there's a boolean for that). If you look at the Huggingface inference toolkit(https://github.com/aws/sagemaker-huggingface-inference-toolkit/blob/2f1fae5cbb3b68299e73cc591c0a912b7cccee29/src/sagemaker_huggingface_inference_toolkit/mms_model_server.py#L21), it works for serverless inference and yet they are using MMS – astiegler Jun 09 '22 at 21:45