4

We have a clustered sharded actor named A and it has multiple child actors created with the child per entity pattern as shown below. When we Tell 100 messages from actor B to D and actor D takes say, 500 ms to process each message, at the same time, when we send the poison pill to actor A using Context.Parent.Tell (new Passivate (PoisonPill.Instance )); It immediately stops all child actors, including actor D, without processing pending messages.

    A
    |
    B    
   / \
  C   D

Is there a way to wait for actor D to process all the messages?

csharpdev
  • 137
  • 6

2 Answers2

3

https://stackoverflow.com/a/70286526/377476 is a good start; you will need a custom shutdown message. When a parent actor terminates, it's children are automatically killed via /system messages which supersede any unprocessed /user messages in their queue.

So what you need to do is ensure that all of their /user messages are processed prior to the parent terminating itself. There's a straightforward way to do this using the GracefulStop extension method in combination with your custom stop message:

public sealed class ActorA : ReceiveActor{
    private IActorRef _actorB;  
    
    private readonly ILoggingAdapter _log = Context.GetLogger();
    
    public ActorA(){
        Receive<StartWork>(w => {
            foreach(var i in Enumerable.Range(0, w.WorkCount)){
                _actorB.Tell(i);
            }
        });
        
        ReceiveAsync<MyStopMessage>(async _ => {
            _log.Info("Begin shutdown");
            
            // stop child actor B with the same custom message
            await _actorB.GracefulStop(TimeSpan.FromSeconds(10), _);
            
            // shut ourselves down after child is done
            Context.Stop(Self);
        });
    }
    
    protected override void PreStart(){
        _actorB = Context.ActorOf(Props.Create(() => new ActorB()), "b");
    }
}

public sealed class ActorB : ReceiveActor{
    private IActorRef _actorC;
    private IActorRef _actorD;
    
    private readonly ILoggingAdapter _log = Context.GetLogger();
    
    public ActorB(){
        Receive<int>(i => {
            _actorC.Tell(i);
            _actorD.Tell(i);
        });
        
        ReceiveAsync<MyStopMessage>(async _ => {
            
            _log.Info("Begin shutdown");
            
            // stop both actors in parallel
            var stopC = _actorC.GracefulStop(TimeSpan.FromSeconds(10));
            var stopD = _actorD.GracefulStop(TimeSpan.FromSeconds(10));
            
            // compose stop Tasks
            var bothStopped = Task.WhenAll(stopC, stopD);
            await bothStopped;
            
            // shut ourselves down immediately
            Context.Stop(Self);
        });
    }
    
    protected override void PreStart(){
        var workerProps = Props.Create(() => new WorkerActor());
        _actorC = Context.ActorOf(workerProps, "c");
        _actorD = Context.ActorOf(workerProps, "d");
    }
}

public sealed class WorkerActor : ReceiveActor {
    private readonly ILoggingAdapter _log = Context.GetLogger();
    
    public WorkerActor(){
        ReceiveAsync<int>(async i => {
            await Task.Delay(10);
            _log.Info("Received {0}", i);
        });
    }
}

I've created a runnable version of this sample here: https://dotnetfiddle.net/xiGyWM - you'll see that the MyStopMessages are received not long after the sample starts, but after C and D have been given work. All of that work completes before any actors terminate in this scenario.

Aaronontheweb
  • 8,224
  • 6
  • 32
  • 61
  • Thank you so much for the timely help, and it solves the issue. – csharpdev Dec 09 '21 at 17:45
  • After this change, we are getting dead letters encountered message upon completion of actor graceful stop. Info, 21, Akka.Actor.LocalActorRef, Message [ActorTaskSchedulerMessage] from [akka://MyApp/system/sharding/A/70/a7a0fc6c-752e-4b15-b547-e8223facacd0/$c#1348831143] to [akka://MyApp/system/sharding/A/70/a7a0fc6c-752e-4b15-b547-e8223facacd0/$c#1348831143] was not delivered. [1] dead letters encountered, no more dead letters will be logged in next [00:05:00]. If this is not an expected behavior then [akka://MyApp/system/sharding/A/70/a7a0fc6c-752e-4b15-b547-e8223facacd0 – csharpdev Dec 15 '21 at 05:10
  • The above dead letters occur only when we subscribe to the MyStopMessage using ReceiveAsync, and seems it is related to the issue mentioned in https://github.com/akkadotnet/akka.net/issues/3259, is there anything specific to do to avoid these dead lettter messages? – csharpdev Dec 15 '21 at 08:43
  • @csharpdev you'd only get those `DeadLetter`s if your actor started processing another message somewhere in your shutdown sequence. Make sure you're using `PoisonPill` to shut things down and not `Context.Stop` - the latter will kill actors immediately regardless of their `async` workloads, by design. – Aaronontheweb Dec 17 '21 at 17:49
2

Instead of sending a PoisonPill - which is a system message and therefore is handled with a higher priority than traditional messages - you may define your own stop message, and let an actor handle it using Context.Stop(Self).

class MyShardedActor : ReceiveActor {
    public MyShardedActor() {
        Receive<MyStopMessage>(_ => Context.Stop(Self));
    }
}

You can register your custom message to be used with passivate calls triggered by the cluster on its own using ClusterSharding.Start method overload, which takes a handOffMessage parameter, that will be send within Passivate request instead of PoisonPill.

Bartosz Sypytkowski
  • 7,463
  • 19
  • 36
  • I tried the way you suggested, but still, the child actors getting stopped quickly without processing all messages, if you need, I can share you a working sample to replicate the issue Thanks Just to clarify, pending messages are in actor D and there is no pending items in actor A – csharpdev Dec 09 '21 at 09:05
  • 1
    Actor A then needs to propagate a stop message to its children (and the children need to likewise propagate the stop message to their children and so on) and then go into a state where it ignores messages from outside its part of the tree and waits for its children to stop before it stops. – Levi Ramsey Dec 09 '21 at 12:55
  • `PoisonPill` isn't a system message - it's handled in the `/user` side of the message queue. The problem in your case is that the queue of messages you need to wait on is different from the actor receiving it. – Aaronontheweb Dec 09 '21 at 15:29
  • However, this answer is fundamentally correct that you need to customize the passivation message in order to change this behavior. – Aaronontheweb Dec 09 '21 at 15:30
  • Thank you Bartosz Sypytkowski and Aaronontheweb – csharpdev Dec 09 '21 at 17:46