4

I am developing Actor class and ray.wait() to collect the results.

Below is the code and console outputs which is collecting the result for only 2 Actors when there are 3 Actors.

import time
import ray


@ray.remote
class Tester:
    def __init__(self, param):
        self.param = param

    def run(self):
        return self.param

params = [0,1,2]
testers = []
for p in params:
    tester = Tester.remote(p)
    testers.append(tester)

runs = []
for i, tester in enumerate(testers):
    runs.append(tester.run.remote())

while len(runs):
    done_id, result_ids = ray.wait(runs)
    #runs size is not decreasing

    result = ray.get(done_id[0])
    print('result:{}'.format(result))
    time.sleep(1)
result:2
(pid=819202) 
(pid=819200) 
(pid=819198) 
result:1
result:0
result:0
result:0
result:0
result:0
...
...
...

The console is printing out forever because the runs variable's size is not reduced.

When I call ray.wait(runs) and get the done_id, runs's element with the done_id should be removed, but it is not removed.

I want the console output to be like below.

result:2
(pid=819202) 
(pid=819200) 
(pid=819198) 
result:1
result:0
dmigo
  • 2,849
  • 4
  • 41
  • 62
tompal18
  • 1,164
  • 2
  • 21
  • 39
  • It seems understandable for me that `ray.wait(runs)` doesn't remove elements in the list `done_id` from the list `runs`. The [ray doc](https://docs.ray.io/en/latest/ray-core/package-ref.html#ray-wait) doesn't mention about removing elements from the input `object_refs` argument. Why do you expect that the `runs` variable's size to be reduced? The function just waits until that `num_returns` objects in the `object_refs` is ready and returns them when no `timeout` is set. – hellohawaii Jul 06 '22 at 08:09
  • It is also strange for me that only 2 of the 3 actors can print output to the console. I can repoduce this phenomenon on my machine. I expect the program to print the output from all 3 actors randomly(and infinitely). – hellohawaii Jul 06 '22 at 08:23

1 Answers1

2

The script you provided is using ray.wait incorrectly. The following code does what you want:

import time
import ray

@ray.remote
class Tester:
    def __init__(self, param):
        self.param = param

    def run(self):
        return self.param

params = [0, 1, 2]

# I use list comprehensions instead of for loops for terseness.
testers = [Tester.remote(p) for p in params]
not_done_ids = [tester.run.remote() for tester in testers]

# len() is not required to check that the list is empty.
while not_done_ids:
    
    # Replace not_done_ids with the list of object references that aren't
    # ready. Store the list of object references that are ready in done_ids.
    # timeout=1 means sleep at most 1 second, do not sleep if there are
    # new object references that are ready.
    done_ids, not_done_ids = ray.wait(not_done_ids, timeout=1)
    
    # ray.get can take an iterable of object references.
    done_return_values = ray.get(done_ids)

    # Process each result.
    for result in done_return_values:
        print(f'result: {result}')


I added the following fixes:

  • ray.wait returns two lists, a list of objects that are ready, and a list of objects that may or may not be ready. You should iterate over the first list to get all object references that are ready.
  • Your while loop goes forever until the list is empty. I simply replaced the runs list with not_done_ids so that once all object references are ready, the while loop breaks.
  • ray.wait supports sleeping, with timeout. I removed your sleep and added timeout=1, which enables the program to run more efficiently (there is no sleep if another object is ready!).
cade
  • 563
  • 4
  • 15
  • Good answer, would you kindly help to expalin the phenomenon I mentioned in my comments that no randomness is presented? – hellohawaii Jul 15 '22 at 02:33
  • There shouldn't be randomness besides nondeterminism in which task finishes first. In fact, if you look at the [Ray source code](https://github.com/ray-project/ray/blob/e31baebc4ea03f07df2136862cc38a2a3ab29f60/src/ray/raylet/wait_manager.cc#L81), the `wait` function constructs the returned lists so that their elements have the same order as they had in the input list. So assuming all tasks finish at the same time, you'll see deterministic ordering. – cade Jul 15 '22 at 18:25
  • Thanks for your explaination. But when executing the code in the question, only 2 of the 3 actors print results in the dead loop(eg, alternate between `result:0` and `result:1` but never get `result:2`). It is strange. It neither keeps printing `result:0` that which comes from the first element in the input list, nor prints all of the three reuslts randomly. Can you reproduce this on your machine? – hellohawaii Jul 15 '22 at 18:32
  • Strange indeed! Could you share your code, perhaps in a new StackOverflow question? The original post doesn't have this behavior (I see `result:2`, then `result:1`, then `result:0`, with `result:0` repeated indefinitely). – cade Jul 15 '22 at 18:49
  • also, there are a lot of Ray experts on the Ray discuss site https://discuss.ray.io/ – cade Jul 15 '22 at 18:50
  • Sorry for the late reply, it turns out that one of my actor died at the begining. So I just get 2 of 3 actors printing results. The program raise Exception after a while. I don't know why. Perhaps I will post this question on the site you mention. – hellohawaii Jul 26 '22 at 02:29