0

I am working on Machine learning with MERN(MongoDb,Experss,ReactJS,NodeJS) Stack in which Aws Keys (Access key and Secret Key) place in MongoDB Configurations and for train model Code writtern in Nodejs and Express.

Problem : After upload the .csv or .xls file Training is not completed in aws account it will display training failed with following error:

AlgorithmError: CannotStartContainerError. Please make sure the container can be run with 'docker run train'. Please refer SageMaker documentation for details. It is possible that the Dockerfile's entrypoint is not properly defined, or missing permissions.

For the Machine learning I will done following setttings : S3 -Create Access key and Secret key and Create Bucket -Keys are placed to MongoDB Configurations and Bucket and region also Placed there.

Sagemaker -> Notebook Instacne -Create Notebook Instance and also service comes in pending status. -This settings done in aws account.

Sagemaker -> I AM Role -Create Role I AM -This settings done in aws account.

Sagemaker -> Model -Create Model and Model ARN and Role ARN. -This settings done in aws account.

ECR ( Elastic Container Registry) -Create Repository with name linear-learner and xgboost -This settings done in aws account.

Dockerfile -Create Dockerfile -Save in Project Folder.

Docker Hub -Create Docker Hub Account

SNS Creadentials -SNS Key and Topic ARN -Set in MongoDB Configurations.

Folllowing Permissions Given by Me : Attached Directory AmazonSagemakerFullAccess AmazonS3FullAccess AmazonSNSFullAccess AmazonEC2ContainerRegistryReadOnly

Attached From Group AmazonEC2FullAccess AmazonDymonDBFullAccess AmazonMachineLearningFullAccess AdministratorAccess AWSElasticBeanStalkFullAccess AmazonSagemakerFullAccess

Dockerfile

FROM Ubuntu
RUN apt-get update
RUN apt-get install curl -y
RUN curl -sL https://deb.nodesource.com/setup_10.x -o nodesource_setup.sh
RUN bash nodesource_setup.sh
RUN apt install nodejs -y
WORKDIR /usr/app
COPY . /usr/app/
RUN npm install
EXPOSE 3000
ENTRYPOINT [ "python3.7", "/opt/ml/code/train.py" ]
Code Image 

Linear : <account_id>.dkr.ecr.us-east-2.amazonaws.com/linear-learner:latest

XgBoost : <account_id>.dkr.ecr.us-east-2.amazonaws.com/xgboost:latest
I run the following commands to build and tag images from dockerfile:
-> $ docker build -t <codeimage>:<tag>
Successfully built <id>
Successfully tagged <codeimage>:<tag>

-> $ docker build -t <account.id>.dkr.ecr.<region>.amazonaws.com/<codeimage>
Successfully build <id>
Successfully tagged <url>

-> $ docker images
-Check all images with tags

-> $ docker tag [image name]:[tag] [repository URI]

> $ sudo aws ecr get-login-password | sudo docker login –username AWS –password -stdin [account.id].dkr.ecr.[region].amazonaws.com
Login Succedded

> $ sudo docker push [account.id].dkr.ecr.[region].amazonaws.com/[repository name]

<id> Pushed

<id> Pushed

Docker Images enter image description here

Docker image is also push in Docker Hub by creating repository with same name which we gave in ECR. enter image description here

After All these settings and things when I upload .csv or .xls file after sometime I got the same error during training model which is training failed as mentions upper and my process is not completed at 100%. enter image description here

After setting all these things and permissions process is not completed yet and getting this error can anyone help me for process these?

  • Does this answer your question? [Training Failed - AWS Machine Learning](https://stackoverflow.com/questions/65628085/training-failed-aws-machine-learning) – rok Jan 13 '21 at 13:54
  • Hi @rok , Yes this is related to the Question Training Failed - Aws Machine Learning but its have more details with question , its very helpful to understand what is my problem if u help me then its very nice to me. – mehul daxini Jan 15 '21 at 08:13
  • Hi @rok Can you do me a one fever for me? I place the commands in docker file as per your repo but I don't know how to run the docker file and what should I do to start the container? can you give me suggestion please? – mehul daxini Jan 19 '21 at 04:16
  • You don't have to start the container, sagemaker will do for you.. But if you want to start your container locally I suggest to refer to the docker documentation, because docker it's not trivial, you need to study all the key concepts.. Sorry, cannot help more than this – rok Jan 19 '21 at 08:56

0 Answers0