0

I am testing ray on 1 head node and 1 cluster node. I started the head node with:

ray start --head --redis-port=6379

and the cluster node with:

ray start --address='<ip_head_node>:6379'

At both the head node and the cluster node, there is f.py & ray_test.py

f.py:

def f(num):
    print("f:", num)
    return num

ray_test.py:

import ray
import f

@ray.remote
def r(num):
      return f.f(num)

 if __name__ == "__main__":
    num_tasks = 8
    ray.init(address="auto")
    result_ids = [r.remote(t) for t in range(num_tasks)]
    results = ray.get(result_ids)
    print(results)

running:

python ray_test.py

works fine.

But, when I modify f.py, (eg. change the print statement), on BOTH head and cluster node, (both head node and cluster node have the same f.py because I'm rsync-ing it over and also verifying on the cluster node), subsequent:

python ray_test.py

still uses the OLD f.py

I've even tried deleting f.py on the cluster node, but the code still runs when it shouldn't (??) So it seems that f.py is being cached somewhere? Or am I missing something?

I've since narrowed it down to using either

ray.init()

which will reload f.py each time, but does not run on the cluster, or

ray.init(address="auto")

which will seems to use a cached version of f.py, and will not reload new versions of it, despite subsequent executions of ray_test.py, but does run on the cluster.

My goal is to be able to modify f.py on head node, rsync it to the cluster node, and run ray_test.py using the NEW f.py, without stopping and restarting ray on both head and cluster. Is that possible, or is that not the correct workflow with ray?

Thanks!

Python 3.6.9, ray==0.8.5

1 Answers1

0

This will be handled at this post instead. https://github.com/ray-project/ray/issues/8550

Sang
  • 885
  • 5
  • 4