i'm trying to deploy a model in a sagemaker endpoint using a custom docker file:
ARG REGION=us-east-1
FROM 763104351884.dkr.ecr.$REGION.amazonaws.com/pytorch-inference:2.0.1-gpu-py310-cu118-ubuntu20.04-sagemaker
RUN pip install poetry
RUN poetry config virtualenvs.create false
WORKDIR /opt/
RUN poetry new code --name models
WORKDIR /opt/code/
RUN poetry add json-lines sagemaker-inference
ADD tuta models/tuta
ENV SAGEMAKER_SUBMIT_DIRECTORY /opt/code
ENV SAGEMAKER_PROGRAM models/tuta/sm_inference.py
the models/tuta contains multiple model file such as layers, metrics... along with the sm_inference.Py file:
from models.tuta.inference import TUTAForCTC
import json
import os
JSON_CONTENT_TYPE = 'application/json'
def model_fn(model_dir):
print("loading the model!")
model = TUTAForCTC(model_bin=os.path.join(model_dir, "tuta-ctc.bin"), model_config_path=os.path.join(model_dir, "config.json"))
print("model loaded!")
return model
def predict_fn(data, model):
print("predicting...")
return {"response": data}
# return model.predict(data['hier_table'], data['flat_table'], data['table_range'])
def input_fn(serialized_input_data, content_type=JSON_CONTENT_TYPE):
print("reading input...")
return json.loads(serialized_input_data)
def output_fn(prediction, content_type):
return prediction
The endpoint gets deployed and has the status InService, with a 200 response when ping. But once i run send a request, i get an error and the ping response is 500.