Akka.NET cluster intermittent dead letters

Question

We have our cluster running locally (for now) and everything seems to be configured correctly. Our prime calculation messages are distributed over our seednodes. However, we are intermittently losing messages. You can see the behaviour of two runs in the screenshot. Which messages are marked as dead letters isn't consistent at all.

Our messages are always sent the same way, they look like this. The last parameter means the nth prime to find.

new PrimeCalculationEntry(id, 1, 100000),
new PrimeCalculationEntry(id, 2, 150000),
new PrimeCalculationEntry(id, 3, 200000),
new PrimeCalculationEntry(id, 4, 250000),
new PrimeCalculationEntry(id, 5, 300000),
new PrimeCalculationEntry(id, 6, 350000),
new PrimeCalculationEntry(id, 7, 400000),
new PrimeCalculationEntry(id, 8, 450000)

Our cluster is set up like this: One non-seednode which is a group router and sends messages to two seednodes, which are configured as pool routers.

Non seednode: localhost:0 (random port)

akka {
            actor {
                provider = cluster
                deployment {
                    /commander {
                        router = round-robin-group # routing strategy
                        routees.paths = ["/user/cluster"] # path of routee on each node
                        cluster {
                            enabled = on
                            allow-local-routees = on
                        }
                    }
                }
            }
            remote {
                dot-netty.tcp {
                    port = 0 #let os pick random port
                    hostname = localhost
                }
            }
            cluster {
                seed-nodes = ["akka.tcp://ClusterSystem@localhost:8081", "akka.tcp://ClusterSystem@localhost:8082"]
            }
        }

Seednode 1: localhost:8081 (leader)

akka {
            actor {
                provider = cluster
                deployment {
                    /cluster {
                        router = round-robin-pool
                        nr-of-instances = 10
                    }
                }
            }
            remote {
                dot-netty.tcp {
                    port = 8081
                    hostname = localhost
                }
            }
            cluster {
                seed-nodes = ["akka.tcp://ClusterSystem@localhost:8081"]
            }
        }

Seednode 2: localhost:8082

akka {
            actor {
                provider = cluster
                deployment {
                    /cluster {
                        router = round-robin-pool
                        nr-of-instances = 10
                    }
                }
            }
            remote {
                dot-netty.tcp {
                    port = 8082
                    hostname = localhost
                }
            }
            cluster {
                seed-nodes = ["akka.tcp://ClusterSystem@localhost:8081"]
            }
        }

Can anyone point us in the right direction? Any issues with our configuration? Thank you in advance.

score 1 · Accepted Answer · answered Mar 03 '22 at 20:00

I think I know what the issue is here - you don't have any akka.cluster.roles defined nor is your /commander router configured with the use-role setting - so as a result, every Nth message is being dropped because it's trying to route a message to itself and does not have a /user/cluster actor present to receive it.

To fix this properly, we should do the following:

Have all nodes that can process the PrimeCalculationEntry declare akka.cluster.roles=[prime]
Have the node with the /commander router change its HOCON to:

     /commander {
        router = round-robin-group # routing strategy
        routees.paths = ["/user/cluster"] # path of routee on each node
        cluster {
            enabled = on
            allow-local-routees = on
            use-role = "prime"
        }
    }

This will eliminate the deadletters as the /commander node will no longer be sending messages to itself every N iterations.

Note: after looking at your issue, I realized that we don't document the roles concept clearly enough in our literature. I've filed an issue to fix that there: https://github.com/akkadotnet/akka.net/issues/5700 — Aaronontheweb, Mar 03 '22 at 20:00

score 0 · Answer 2 · answered Mar 11 '22 at 08:47

0

I saw the answer from @Aaronontheweb too late. We "fixed" it by setting allow-local-routees to off on the commandor HOCON. But I guess a better solution would be to set roles correctly as mentioned in the answer.

answered Mar 11 '22 at 08:47

Stephan Bisschop

198
1
12

1

That's also a possible solution to this problem - and your conclusion is correct. Using roles is the most robust way to solve this issue. – Aaronontheweb Mar 14 '22 at 12:44

Akka.NET cluster intermittent dead letters

2 Answers2