2

Akka app on Kubernetes is facing delayed heartbeats, even when there is no load.

There is also constantly the following warning:

heartbeat interval is growing too large for address ...

I tried to add a custom dispatcher for the cluster, even for every specific actor but did not help. I am not doing any blocking operations, since it is just a simple Http server.

When the cluster has load, the nodes get Unreachable.

I created a repository which can be used to reproduce the issue : https://github.com/CostasChaitas/Akka-Demo

  • 1
    Possible duplicate of [Akka Cluster heartbeat delays on Kubernetes](https://stackoverflow.com/questions/58015699/akka-cluster-heartbeat-delays-on-kubernetes) – David Ogren Sep 22 '19 at 20:48

2 Answers2

2

First, thanks for the well documented reproducer. I did find one minor glitch with a dependency you included, but it was easy to resolve.

That said, I was unable to reproduce your errors. Everything worked fine on my local machine and on my dev cluster. You don't include your load generator, so maybe I just wasn't generating as sustained a load, but I got no heartbeat delays at all.

I suspect this is a duplicate of Akka Cluster heartbeat delays on Kubernetes . If so, it sounds like you've already checked for my usual suspects of GC and CFS. And if you are able to reproduce locally it also make it improbable that it's my other common problem of badly configured K8 networking. (I had one client that was having problems with Akka clustering on K8 and it turns out that it was just a badly configured cluster: the network was dropping and delaying packets between pods.)

Since you say this is load testing perhaps you are just running out of sockets/files? You don't have much in the way of HTTP server configuration. (Nor any JVM options.)

I think my next debugging step would be to connect to one of the running contains and trying to test the network between the pods in the network.

David Ogren
  • 4,396
  • 1
  • 19
  • 29
2

I was also having same issue of growing heartbeat intervals but in my case it was once I started using the cluster though the load was not high, I was trying only 2tps.

Going through the Akka documentation I found Akka discourages using resources.limits.cpu. I removed it from my deployment manifest file and it works fine without delays.

You can refer the documentation here: https://doc.akka.io/docs/akka/current/additional/deploying.html?_ga=2.222760347.1686781468.1643119007-1504733962.1642433119#resource-limits

Karan Khanna
  • 1,947
  • 3
  • 21
  • 49