I am using a huggingface model alongside a custom pipeline to deploy my model onto SageMaker, my model.tar.gz structure looks like below:
├── added_tokens.json
├── code
│ ├── inference.py
│ ├── pipeline.py
│ └── requirements.txt
├── config.json
├── generation_config.json
├── model-00001-of-00002.safetensors
├── model-00002-of-00002.safetensors
├── model.safetensors.index.json
├── special_tokens_map.json
├── tokenizer_config.json
├── tokenizer.json
└── tokenizer.model
I deployed my model via
from sagemaker.huggingface.model import HuggingFaceModel
hub = {
'HF_TASK':'text-generation'
}
huggingface_model = HuggingFaceModel(
env=hub,
model_data="s3://my_model_bucket/model.tar.gz",
role=role,
transformers_version="4.28",
pytorch_version="2.0",
py_version='py310',
)
# deploy the endpoint endpoint
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.xlarge"
)
However, when I try to invoke the model, here is my response
{
"code": 400,
"type": "InternalServerException",
"message": "/opt/ml/model does not appear to have a file named config.json. Checkout \u0027https://huggingface.co//opt/ml/model/None\u0027 for available files."
}
Another error is W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - OSError: /opt/ml/model does not appear to have a file named config.json. Checkout 'https://huggingface.co//opt/ml/model/None' for available files.
But config.json is clearly in my model directory. Here is my inference.py code
import torch
from typing import Dict
from transformers import AutoTokenizer, AutoModelForCausalLM
from pipeline import MyCustomPipeline
pipeline = None
def model_fn(model_dir):
print("Loading model from: " + model_dir)
tokenizer = AutoTokenizer.from_pretrained(
model_dir,
local_files_only=True,
)
model = AutoModelForCausalLM.from_pretrained(
model_dir,
local_files_only=True,
device_map="auto",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
)
pipeline = MyCustomPipeline(model, tokenizer)
return model, tokenizer
def transform_fn(model, input_data, content_type, accept):
return pipeline(input_data)
What am I doing wrong here? I should have followed all needed steps to deploy a huggingface model onto SageMaker.