1

I want to train a custom ML model with SageMaker. The model is written in Python and should be shipped to SageMaker in a Docker image. Here is a simplified version of my Dockerfile (the model sits in the train.py file):

FROM amazonlinux:latest

# Install Python 3
RUN yum -y update && yum install -y python3-pip python3-devel gcc && yum clean all

# Install sagemaker-containers (the official SageMaker utils package)
RUN pip3 install --target=/usr/local/lib/python3.7/site-packages sagemaker-containers && rm -rf /root/.cache

# Bring the script with the model to the image 
COPY train.py /opt/ml/code/train.py

ENV SAGEMAKER_PROGRAM train.py

Now, if I initialize this image as a SageMaker estimator and then run the fit method on this estimator I get the following error:

"AlgorithmError: CannotStartContainerError. Please make sure the container can be run with 'docker run train'."

In other words: SageMaker is not able to get into the container and run the train.py file. But why? The way I am specifying the entrypoint with ENV SAGEMAKER_PROGRAM train.py is recommended in the docs of the sagemaker-containers package (see 'How a script is executed inside the container').

Joe
  • 1,628
  • 3
  • 25
  • 39

2 Answers2

1

I found a hint in the AWS docs and came up with this solution:

ENTRYPOINT ["python3.7", "/opt/ml/code/train.py"]

With this the container will run as an executable.

Joe
  • 1,628
  • 3
  • 25
  • 39
0

i had this same error while using the ´sagemaker-training’ toolkit. Try to specify the package version in the ‘pip install’ command. It solved my issue. Not wonder why...