Firstly, note that ray.get
is a blocking call. This means that your program will be blocked and cannot go to the next line of code until ray.get
function is succeeded. (You can prevent this by adding a timeout
argument to remote
function).
This happens because l
is blocked until worker.step.remote
is done (ray.get(worker.step.remote()
). When worker.step
method is called, it tries to call l.p.remote
. w
will blocked until l.p
is done because of ray.get(self.l.p.remote(self.a)
. But as you can see, l
is blocked and cannot run any code. It means that l.p
will never run until l.step
is done. Here is a simple diagram for your understanding.

Now both workers are blocked and l.step.remote
will never be done. That means your driver (Python script) is also blocked.
As a result, the whole program is hang!!
Then how to solve this problem?
Firstly, I highly discourage you to use the pattern that two actor classes are waiting for each other. This is generally a bad pattern even when you are writing other programs. This can be solved when programs are multi-threaded or asynchronous.
If you really need to use this pattern, you can use the async actor. Async actor uses await
instead of ray.get
, and each actors are not blocked because they are running as coroutine.
https://ray.readthedocs.io/en/latest/async_api.html
EX)
import ray
ray.init()
@ray.remote
class Worker:
def __init__(self):
self.a = 1
self.l = None
def set(self, learner):
self.l = learner
async def step(self):
x = await self.l.p.remote(self.a)
return x
@ray.remote
class Learner:
def __init__(self):
self.a = 3
async def step(self, worker):
print(await worker.step.remote())
async def p(self, a):
return a + self.a
l = Learner.remote()
w = Worker.remote()
w.set.remote(l)
await l.step.remote(w)
# ray.shutdown()