3

I am finding it incredibly difficult to follow rays guidelines to running a docker image on a ray cluster in order to execute a python script. I am finding a lack of simple working examples.

So I have the simplest docker file:

FROM rayproject/ray
WORKDIR /usr/src/app
COPY . .
CMD ["step_1.py"]
ENTRYPOINT ["python3"]

I use this to create can image and push this to docker hub. ("myimage" is just an example)

docker build -t myimage .   
docker push myimage

"step_1.py" just prints hello every second for 200 seconds:

import time
for i in range(200):
    time.sleep(1)
    print("hello")

This is my config.yaml. again very simple:

cluster_name: simple-1

min_workers: 0
max_workers: 2

docker:
    image: "myimage"    
    container_name: "my_simple_docker_container"
    pull_before_run: True

idle_timeout_minutes: 5

provider:
    type: aws
    region: eu-west-2
    availability_zone: eu-west-2a

file_mounts_sync_continuously: False



auth:
    ssh_user: ubuntu
    ssh_private_key: /home/user/.ssh/aws_ubuntu_test.pem
head_node:
    InstanceType: c5.2xlarge
    ImageId: ami-xxxxx826a6b31fd2c
    KeyName: aws_ubuntu_test

    BlockDeviceMappings:
      - DeviceName: /dev/sda1
        Ebs:
          VolumeSize: 200

worker_nodes:
   InstanceType: c5.2xlarge
   ImageId: ami-xxxxx826a6b31fd2c
   KeyName: aws_ubuntu_test
   InstanceMarketOptions:
        MarketType: spot

head_setup_commands:
    - pip install boto3==1.4.8

worker_setup_commands:  []

head_start_ray_commands:
    - ray stop
    - ulimit -n 65536; ray start --head --port=6379 --object-manager-port=8076 --autoscaling-config=~/ray_bootstrap_config.yaml

worker_start_ray_commands:
    - ray stop
    - ulimit -n 65536; ray start --address=$RAY_HEAD_IP:6379 --object-manager-port=8076

I hit in the terminal:

ray up simple1.yaml:  

and this error every time:

shared connection to x.x.xx.119 closed.
"docker cp" requires exactly 2 arguments.
See 'docker cp --help'.

Usage:  docker cp [OPTIONS] CONTAINER:SRC_PATH DEST_PATH|-
        docker cp [OPTIONS] SRC_PATH|- CONTAINER:DEST_PATH

Copy files/folders between a container and the local filesystem
Shared connection to x.x.xx.119 closed.

Just to add the docker image will run on any other remote machine just fine, just not on the the ray cluster.

If someone could please help me, I would be eternally grateful, and I will even promise to add a tutorial on medium after my struggles.

jtm101
  • 85
  • 6

1 Answers1

1

I think the issue might be around using ENTRYPOINT. The Ray ClusterLauncher starts docker using a command roughly like:

docker run --rm --name <NAME> -d -it --net=host <image_name> bash

When I ran docker build -t myimage . and then ran docker run --rm -it myimage bash, Docker errored with:

python3: can't open file 'bash': [Errno 2] No such file or directory
Ian Rodney
  • 11
  • 1
  • this worked. The issue i'm having now is that I cannot see the output of the docker container. do you know where I might find this? – jtm101 Jan 18 '21 at 11:48