1

We have two nodes with the same code, that are using akka.net in a cluster and send messages using remote between them.
Akka.Net version is 1.2.0 and we are using dot-netty for transport. This is the relevant configuration section:
actor { provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster" } remote { dot-netty.tcp { port = 34083 hostname = host_name } }

The two nodes run on different Windows servers (hosted on a Windows Service). Sometimes, a node stops listening to the assigned port (checked by using netstat -an) and all communication between them is lost until I restart the Windows Service.
This is all the information we get in the logs (the first 2 messages are from one host and the third one from the other):
60133 2017-08-11 10:09:11.993 Host1 Akka.Remote.Transport.ProtocolStateActor Error No response from remote. Handshake timed out or transport failure detector triggered. 60134 2017-08-11 10:09:12.040 Host1 Akka.Remote.ReliableDeliverySupervisor Warn Association with remote system akka.tcp://ProcesamientoActorSystem@warpacb004.nead.danet:34083 has failed; address is now gated for 5000 ms. Reason is: [Akka.Remote.EndpointDisassociatedException: Disassociated at Akka.Remote.EndpointWriter.PublishAndThrow(Exception reason, LogLevel level, Boolean needToThrow) at Akka.Actor.ReceiveActor.ExecutePartialMessageHandler(Object message, PartialAction1 partialAction) at Akka.Actor.UntypedActor.Receive(Object message) at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message) at Akka.Actor.ActorCell.ReceiveMessage(Object message) at Akka.Actor.ActorCell.AutoReceiveMessage(Envelope envelope) at Akka.Actor.ActorCell.Invoke(Envelope envelope) --- End of stack trace from previous location where exception was thrown --- at Akka.Actor.ActorCell.HandleFailed(Failed f) at Akka.Actor.ActorCell.SysMsgInvokeAll(EarliestFirstSystemMessageList messages, Int32 currentState)]
60135 2017-08-11 10:09:14.313 Host2 Akka.Remote.ReliableDeliverySupervisor Warn Association with remote system akka.tcp://ProcesamientoActorSystem@warpacb005.nead.danet:34083 has failed; address is now gated for 5000 ms. Reason is: [Akka.Remote.EndpointDisassociatedException: Disassociated at Akka.Remote.EndpointWriter.PublishAndThrow(Exception reason, LogLevel level, Boolean needToThrow) at Akka.Actor.ReceiveActor.ExecutePartialMessageHandler(Object message, PartialAction
1 partialAction) at Akka.Actor.ActorCell.<>c__DisplayClass112_0.b__0(Object m) at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message) at Akka.Actor.ActorCell.ReceiveMessage(Object message) at Akka.Actor.ActorCell.AutoReceiveMessage(Envelope envelope) at Akka.Actor.ActorCell.Invoke(Envelope envelope)]
I guess something is failing at the transport layer, and dot-netty close the socket and stop listening.
Is there any way to stop this from happening or at least make it less frecuent? If not, can we hook to the failure event to start listening again?

Guillermo Vasconcelos
  • 1,701
  • 2
  • 17
  • 30

2 Answers2

0

Without more information I can't comment on the runtime behaviors at full, but one thing I can spot immediately is that your incoming connection is listening on localhost, thus it can't accept any external connections addressed to this machine.

In general when it comes to Akka.Remote, always use an IP address for your hostname value. Sockets do not natively support DNS and thus we have to resolve all hostnames back into their IP form in order to open a connection. Depending on your network and hardware configuration, internal hostnames can be unreliable.

But if you don't mind, please post a complete log for troubleshooting network issues like this.

Aaronontheweb
  • 8,224
  • 6
  • 32
  • 61
  • The communication is working most of the time, until the error occurs and it stop working. We are using Actor Selection to access the remote actors. The error has happened a few times in the production environment but it didn't happened in QA, with the same configuration – Guillermo Vasconcelos Aug 11 '17 at 20:37
  • I'm sorry, the localhost part was not correct, we have the host DNS name. We will replace it with the IP instead to see what happens – Guillermo Vasconcelos Aug 11 '17 at 20:55
0

We upgraded Akka to 1.2.3 and it started working correctly. We see the same errors in the log from time to time, but the connection is not dropped.

Guillermo Vasconcelos
  • 1,701
  • 2
  • 17
  • 30