I'm deploying TheBloke/Llama-2-7b-Chat-GPTQ " model on sagemaker. I'm running this code in sagemaker notebook instance. I've used "ml.g4dn.xlarge" instance for deployement. I've used the same code that have been shown on the deployment on Amazon Sagemaker button on huggingface.
After running the code it takes 10 min of processing it shows me this output while processing: Output:
These dashes shows the model is deploying. After these dashes I got this error:
Error:
UnexpectedStatusException: Error hosting endpoint huggingface-pytorch-tgi-inference-2023-08-24-06-51-13-816: Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint..