I am working on Machine learning with MERN(MongoDb,Experss,ReactJS,NodeJS) Stack in which Aws Keys (Access key and Secret Key) place in MongoDB Configurations and for train model Code writtern in Nodejs and Express.
Problem : After upload the .csv or .xls file Training is not completed in aws account it will display training failed with following error:
AlgorithmError: CannotStartContainerError. Please make sure the container can be run with 'docker run train'. Please refer SageMaker documentation for details. It is possible that the Dockerfile's entrypoint is not properly defined, or missing permissions.
For the Machine learning I will done following setttings : S3 -Create Access key and Secret key and Create Bucket -Keys are placed to MongoDB Configurations and Bucket and region also Placed there.
Sagemaker -> Notebook Instacne -Create Notebook Instance and also service comes in pending status. -This settings done in aws account.
Sagemaker -> I AM Role -Create Role I AM -This settings done in aws account.
Sagemaker -> Model -Create Model and Model ARN and Role ARN. -This settings done in aws account.
ECR ( Elastic Container Registry) -Create Repository with name linear-learner and xgboost -This settings done in aws account.
Dockerfile -Create Dockerfile -Save in Project Folder.
Docker Hub -Create Docker Hub Account
SNS Creadentials -SNS Key and Topic ARN -Set in MongoDB Configurations.
Folllowing Permissions Given by Me : Attached Directory AmazonSagemakerFullAccess AmazonS3FullAccess AmazonSNSFullAccess AmazonEC2ContainerRegistryReadOnly
Attached From Group AmazonEC2FullAccess AmazonDymonDBFullAccess AmazonMachineLearningFullAccess AdministratorAccess AWSElasticBeanStalkFullAccess AmazonSagemakerFullAccess
Dockerfile
FROM Ubuntu
RUN apt-get update
RUN apt-get install curl -y
RUN curl -sL https://deb.nodesource.com/setup_10.x -o nodesource_setup.sh
RUN bash nodesource_setup.sh
RUN apt install nodejs -y
WORKDIR /usr/app
COPY . /usr/app/
RUN npm install
EXPOSE 3000
ENTRYPOINT [ "python3.7", "/opt/ml/code/train.py" ]
Code Image
Linear : <account_id>.dkr.ecr.us-east-2.amazonaws.com/linear-learner:latest
XgBoost : <account_id>.dkr.ecr.us-east-2.amazonaws.com/xgboost:latest
I run the following commands to build and tag images from dockerfile:
-> $ docker build -t <codeimage>:<tag>
Successfully built <id>
Successfully tagged <codeimage>:<tag>
-> $ docker build -t <account.id>.dkr.ecr.<region>.amazonaws.com/<codeimage>
Successfully build <id>
Successfully tagged <url>
-> $ docker images
-Check all images with tags
-> $ docker tag [image name]:[tag] [repository URI]
> $ sudo aws ecr get-login-password | sudo docker login –username AWS –password -stdin [account.id].dkr.ecr.[region].amazonaws.com
Login Succedded
> $ sudo docker push [account.id].dkr.ecr.[region].amazonaws.com/[repository name]
<id> Pushed
<id> Pushed
Docker image is also push in Docker Hub by creating repository with same name which we gave in ECR.
After All these settings and things when I upload .csv or .xls file after sometime I got the same error during training model which is training failed as mentions upper and my process is not completed at 100%.
After setting all these things and permissions process is not completed yet and getting this error can anyone help me for process these?