I am trying to use the Ray library to launch runs on multiple remote machines with Docker. Per the docs, I use ray up CONFIG_YAML
to set up my cluster and ray submit [OPTIONS] CLUSTER_CONFIG_FILE SCRIPT
to run a script on them. The problem is that the process/container only launches on the head node and nothing runs on the workers.
Examining the source, ray up CONFIG_YAML
calls the function create_or_update_cluster
and ray submit [OPTIONS] CLUSTER_CONFIG_FILE SCRIPT
calls submit
. Neither of these appear to interact with any node except the head.
Here is my Dockerfile: https://github.com/lobachevzky/ppo/blob/debug/Dockerfile
Here is my cluster config file: https://github.com/lobachevzky/ppo/blob/debug/tune.yaml
Here is my script: https://github.com/lobachevzky/ppo/blob/debug/tune_script.py