0

I am trying to use the Ray library to launch runs on multiple remote machines with Docker. Per the docs, I use ray up CONFIG_YAML to set up my cluster and ray submit [OPTIONS] CLUSTER_CONFIG_FILE SCRIPT to run a script on them. The problem is that the process/container only launches on the head node and nothing runs on the workers.

Examining the source, ray up CONFIG_YAML calls the function create_or_update_cluster and ray submit [OPTIONS] CLUSTER_CONFIG_FILE SCRIPT calls submit. Neither of these appear to interact with any node except the head.

Here is my Dockerfile: https://github.com/lobachevzky/ppo/blob/debug/Dockerfile

Here is my cluster config file: https://github.com/lobachevzky/ppo/blob/debug/tune.yaml

Here is my script: https://github.com/lobachevzky/ppo/blob/debug/tune_script.py

ethanabrooks
  • 747
  • 8
  • 19

0 Answers0