0

I am having a problem joining and debugging joining to Akka.NET cluster. I am using version 1.3.8. My setup is following:

Lighthouse

Almost default code from github. Runs in console akka.hocon is following:

lighthouse {
  actorsystem: "sng"
}

petabridge.cmd{
    host = "0.0.0.0"
    port = 9110
}

akka {
  loglevel = DEBUG
  loggers = ["Akka.Logger.Serilog.SerilogLogger, Akka.Logger.Serilog"]
  actor {
    provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"
                  debug {
                  receive = on
                  autoreceive = on
                  lifecycle = on
                  event-stream = on
                  unhandled = on
              }
  }
  remote {
      log-sent-messages = on
      log-received-messages = on
      log-remote-lifecycle-events = on
        enabled-transports = ["akka.remote.dot-netty.tcp"]
    dot-netty.tcp {
      transport-class = "Akka.Remote.Transport.DotNetty.TcpTransport, Akka.Remote"
      applied-adapters = []
      transport-protocol = tcp
      hostname = "0.0.0.0"
      port = 4053
    }
    log-remote-lifecycle-events = DEBUG
  }            
  cluster {
    auto-down-unreachable-after = 5s
    seed-nodes = [] 
    roles = [lighthouse]
  }
}

Working node

Also console (net461) application with as simple as possible startup and joining. It works as excpected. akka.hocon:

akka {
  loglevel = DEBUG
  loggers = ["Akka.Logger.Serilog.SerilogLogger, Akka.Logger.Serilog"]
  actor {
    provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"
  }

  remote {
      log-sent-messages = on
      log-received-messages = on
      log-remote-lifecycle-events = on
    dot-netty.tcp {
      transport-class = "Akka.Remote.Transport.DotNetty.TcpTransport, Akka.Remote"
      applied-adapters = []
      transport-protocol = tcp
      hostname = "0.0.0.0"
      port = 0
    }
  }            

  cluster {
    auto-down-unreachable-after = 5s
    seed-nodes = ["akka.tcp://sng@127.0.0.1:4053"] 
    roles = [monitor]
  }
}

Not working node

An .NET 4.6.1 library, registerd as COM and started in other (Media Monkey) application with VBA code:

Sub OnStartup
   Set o = CreateObject("MediaMonkey.Akka.Agent.MediaMonkeyAkkaProxy")
   o.Init(SDB)
End Sub

Akka system is, as in console aplikation, created with standard ActorSystem.Create("sng", config);

akka.hocon:

akka {
  loglevel = DEBUG
  loggers = ["Akka.Logger.Serilog.SerilogLogger, Akka.Logger.Serilog"]
  actor {
    provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"
  }
  remote {
      log-sent-messages = on
      log-received-messages = on
      log-remote-lifecycle-events = on
    dot-netty.tcp {
      transport-class = "Akka.Remote.Transport.DotNetty.TcpTransport, Akka.Remote"
      applied-adapters = []
      transport-protocol = tcp
      hostname = "0.0.0.0"
      port = 0
    }
  }            
  cluster {
    auto-down-unreachable-after = 5s
    seed-nodes = ["akka.tcp://sng@127.0.0.1:4053"] 
    roles = [mediamonkey]
  }
}

Debugging workflow

  1. Startup Lighthouse application:

    Configuration Result: [Success] Name sng.Lighthouse [Success] ServiceName sng.Lighthouse Topshelf v4.0.0.0, .NET Framework v4.0.30319.42000 [Lighthouse] ActorSystem: sng; IP: 127.0.0.1; PORT: 4053 [Lighthouse] Performing pre-boot sanity check. Should be able to parse address [akka.tcp://sng@127.0.0.1:4053] [Lighthouse] Parse successful. [21:01:35 INF] Starting remoting [21:01:35 INF] Remoting started; listening on addresses : [akka.tcp://sng@127.0.0.1:4053] [21:01:35 INF] Remoting now listens on addresses: [akka.tcp://sng@127.0.0.1:4053] [21:01:35 INF] Cluster Node [akka.tcp://sng@127.0.0.1:4053] - Starting up... [21:01:35 INF] Cluster Node [akka.tcp://sng@127.0.0.1:4053] - Started up successfully The sng.Lighthouse service is now running, press Control+C to exit. [21:01:35 INF] petabridge.cmd host bound to [0.0.0.0:9110] [21:01:35 INF] Node [akka.tcp://sng@127.0.0.1:4053] is JOINING, roles [lighthouse] [21:01:35 INF] Leader is moving node [akka.tcp://sng@127.0.0.1:4053] to [Up]

  2. Started and stopped working console node

Lighthouse logs:

[21:05:40 INF] Node [akka.tcp://sng@0.0.0.0:37516] is JOINING, roles [monitor]
[21:05:40 INF] Leader is moving node [akka.tcp://sng@0.0.0.0:37516] to [Up]
[21:05:54 INF] Connection was reset by the remote peer. Channel [[::ffff:127.0.0.1]:4053->[::ffff:127.0.0.1]:37517](Id=1293c63a)
[21:05:54 INF] Message AckIdleCheckTimer from akka://sng/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fsng%400.0.0.0%3A37516-1/endpointWriter to akka://sng/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fsng%400.0.0.0%3A37516-1/endpointWriter was not delivered. 1 dead letters encountered.
[21:05:55 INF] Message GossipStatus from akka://sng/system/cluster/core/daemon to akka://sng/deadLetters was not delivered. 2 dead letters encountered.
[21:05:55 INF] Message Heartbeat from akka://sng/system/cluster/core/daemon/heartbeatSender to akka://sng/deadLetters was not delivered. 3 dead letters encountered.
[21:05:56 INF] Message GossipStatus from akka://sng/system/cluster/core/daemon to akka://sng/deadLetters was not delivered. 4 dead letters encountered.
[21:05:56 INF] Message Heartbeat from akka://sng/system/cluster/core/daemon/heartbeatSender to akka://sng/deadLetters was not delivered. 5 dead letters encountered.
[21:05:57 INF] Message GossipStatus from akka://sng/system/cluster/core/daemon to akka://sng/deadLetters was not delivered. 6 dead letters encountered.
[21:05:57 INF] Message Heartbeat from akka://sng/system/cluster/core/daemon/heartbeatSender to akka://sng/deadLetters was not delivered. 7 dead letters encountered.
[21:05:58 INF] Message GossipStatus from akka://sng/system/cluster/core/daemon to akka://sng/deadLetters was not delivered. 8 dead letters encountered.
[21:05:58 INF] Message Heartbeat from akka://sng/system/cluster/core/daemon/heartbeatSender to akka://sng/deadLetters was not delivered. 9 dead letters encountered.
[21:05:59 WRN] Cluster Node [akka.tcp://sng@127.0.0.1:4053] - Marking node(s) as UNREACHABLE [Member(address = akka.tcp://sng@0.0.0.0:37516, Uid=1060233119 status = Up, role=[monitor], upNumber=2)]. Node roles [lighthouse]
[21:06:01 WRN] AssociationError [akka.tcp://sng@127.0.0.1:4053] -> akka.tcp://sng@0.0.0.0:37516: Error [Association failed with akka.tcp://sng@0.0.0.0:37516] []
[21:06:01 WRN] Tried to associate with unreachable remote address [akka.tcp://sng@0.0.0.0:37516]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [Association failed with akka.tcp://sng@0.0.0.0:37516] Caused by: [System.AggregateException: One or more errors occurred. ---> Akka.Remote.Transport.InvalidAssociationException: No connection could be made because the target machine actively refused it tcp://sng@0.0.0.0:37516
   at Akka.Remote.Transport.DotNetty.TcpTransport.<AssociateInternal>d__1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Akka.Remote.Transport.DotNetty.DotNettyTransport.<Associate>d__22.MoveNext()
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
   at Akka.Remote.Transport.ProtocolStateActor.<>c.<InitializeFSM>b__11_54(Task`1 result)
   at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
   at System.Threading.Tasks.Task.Execute()
---> (Inner Exception #0) Akka.Remote.Transport.InvalidAssociationException: No connection could be made because the target machine actively refused it tcp://sng@0.0.0.0:37516
   at Akka.Remote.Transport.DotNetty.TcpTransport.<AssociateInternal>d__1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Akka.Remote.Transport.DotNetty.DotNettyTransport.<Associate>d__22.MoveNext()<---
]
[21:06:04 INF] Cluster Node [akka.tcp://sng@127.0.0.1:4053] - Leader is auto-downing unreachable node [akka.tcp://sng@127.0.0.1:4053]
[21:06:04 INF] Marking unreachable node [akka.tcp://sng@0.0.0.0:37516] as [Down]
[21:06:05 INF] Leader is removing unreachable node [akka.tcp://sng@0.0.0.0:37516]
[21:06:05 WRN] Association to [akka.tcp://sng@0.0.0.0:37516] having UID [1060233119] is irrecoverably failed. UID is now quarantined and all messages to this UID will be delivered to dead letters. Remote actorsystem must be restarted to recover from this situation.

Working node logs:

[21:05:38 INF] Starting remoting
[21:05:38 INF] Remoting started; listening on addresses : [akka.tcp://sng@0.0.0.0:37516]
[21:05:38 INF] Remoting now listens on addresses: [akka.tcp://sng@0.0.0.0:37516]
[21:05:38 INF] Cluster Node [akka.tcp://sng@0.0.0.0:37516] - Starting up...
[21:05:38 INF] Cluster Node [akka.tcp://sng@0.0.0.0:37516] - Started up successfully
[21:05:40 INF] Welcome from [akka.tcp://sng@127.0.0.1:4053]
[21:05:40 INF] Member is Up: Member(address = akka.tcp://sng@127.0.0.1:4053, Uid=439782041 status = Up, role=[lighthouse], upNumber=1)
[21:05:40 INF] Member is Up: Member(address = akka.tcp://sng@0.0.0.0:37516, Uid=1060233119 status = Up, role=[monitor], upNumber=2)
//shutdown logs are missing
  1. Started and stopped COM node

Lighthouse logs:

[21:12:02 INF] Connection was reset by the remote peer. Channel [::ffff:127.0.0.1]:4053->[::ffff:127.0.0.1]:37546](Id=4ca91e15)

COM node logs:

[WARNING][18. 07. 2018 19:11:15][Thread 0001][ActorSystem(sng)] The type name for serializer 'hyperion' did not resolve to an actual Type: 'Akka.Serialization.HyperionSerializer, Akka.Serialization.Hyperion'
[WARNING][18. 07. 2018 19:11:15][Thread 0001][ActorSystem(sng)] Serialization binding to non existing serializer: 'hyperion'
[21:11:15 DBG] Logger log1-SerilogLogger [SerilogLogger] started
[21:11:15 DBG] StandardOutLogger being removed
[21:11:15 DBG] Default Loggers started
[21:11:15 INF] Starting remoting
[21:11:15 DBG] Starting prune timer for endpoint manager...
[21:11:15 INF] Remoting started; listening on addresses : [akka.tcp://sng@0.0.0.0:37543]
[21:11:15 INF] Remoting now listens on addresses: [akka.tcp://sng@0.0.0.0:37543]
[21:11:15 INF] Cluster Node [akka.tcp://sng@0.0.0.0:37543] - Starting up...
[21:11:15 INF] Cluster Node [akka.tcp://sng@0.0.0.0:37543] - Started up successfully
[21:11:15 DBG] [Uninitialized] Received Akka.Cluster.InternalClusterAction+Subscribe
[21:11:15 DBG] [Uninitialized] Received Akka.Cluster.InternalClusterAction+Subscribe
[21:11:16 DBG] [Uninitialized] Received Akka.Cluster.InternalClusterAction+JoinSeedNodes
[21:11:16 DBG] [Uninitialized] Received Akka.Cluster.InternalClusterAction+Subscribe
[21:11:26 WRN] Couldn't join seed nodes after [2] attempts, will try again. seed-nodes=[akka.tcp://sng@127.0.0.1:4053]
[21:11:31 WRN] Couldn't join seed nodes after [3] attempts, will try again. seed-nodes=[akka.tcp://sng@127.0.0.1:4053]
[21:11:36 WRN] Couldn't join seed nodes after [4] attempts, will try again. seed-nodes=[akka.tcp://sng@127.0.0.1:4053]
[21:11:40 ERR] No response from remote. Handshake timed out or transport failure detector triggered.
[21:11:40 WRN] AssociationError [akka.tcp://sng@0.0.0.0:37543] -> akka.tcp://sng@127.0.0.1:4053: Error [Association failed with akka.tcp://sng@127.0.0.1:4053] []
[21:11:40 WRN] Tried to associate with unreachable remote address [akka.tcp://sng@127.0.0.1:4053]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [Association failed with akka.tcp://sng@127.0.0.1:4053] Caused by: [Akka.Remote.Transport.AkkaProtocolException: No response from remote. Handshake timed out or transport failure detector triggered.
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Akka.Remote.Transport.AkkaProtocolTransport.<Associate>d__19.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Akka.Remote.EndpointWriter.<AssociateAsync>d__23.MoveNext()]
[21:11:40 DBG] Disassociated [akka.tcp://sng@0.0.0.0:37543] -> akka.tcp://sng@127.0.0.1:4053
[21:11:40 INF] Message InitJoin from akka://sng/system/cluster/core/daemon/joinSeedNodeProcess-1 to akka://sng/deadLetters was not delivered. 1 dead letters encountered.
[21:11:40 INF] Message InitJoin from akka://sng/system/cluster/core/daemon/joinSeedNodeProcess-1 to akka://sng/deadLetters was not delivered. 2 dead letters encountered.
[21:11:40 INF] Message InitJoin from akka://sng/system/cluster/core/daemon/joinSeedNodeProcess-1 to akka://sng/deadLetters was not delivered. 3 dead letters encountered.
[21:11:40 INF] Message InitJoin from akka://sng/system/cluster/core/daemon/joinSeedNodeProcess-1 to akka://sng/deadLetters was not delivered. 4 dead letters encountered.
[21:11:40 INF] Message InitJoin from akka://sng/system/cluster/core/daemon/joinSeedNodeProcess-1 to akka://sng/deadLetters was not delivered. 5 dead letters encountered.
[21:11:40 INF] Message AckIdleCheckTimer from akka://sng/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fsng%40127.0.0.1%3A4053-1/endpointWriter to akka://sng/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fsng%40127.0.0.1%3A4053-1/endpointWriter was not delivered. 6 dead letters encountered.
[21:11:41 WRN] Couldn't join seed nodes after [5] attempts, will try again. seed-nodes=[akka.tcp://sng@127.0.0.1:4053]
[21:11:41 INF] Message InitJoin from akka://sng/system/cluster/core/daemon/joinSeedNodeProcess-1 to akka://sng/deadLetters was not delivered. 7 dead letters encountered.
[21:11:46 WRN] Couldn't join seed nodes after [6] attempts, will try again. seed-nodes=[akka.tcp://sng@127.0.0.1:4053]
[21:11:51 WRN] Couldn't join seed nodes after [7] attempts, will try again. seed-nodes=[akka.tcp://sng@127.0.0.1:4053]

Do you have any idea how to debug and/or resolve this?

Rok
  • 451
  • 3
  • 21

1 Answers1

0

As I can see that the first thing I notice in the non-working node the hocon configuration contains different "seed-nodes" address from the working node.

IMHO the "seed-nodes" in all the applications [nodes as called in cluster] withinvthe cluster needs to be same. So in the non-working node instead of

seed-nodes = ["akka.tcp://songoulash@127.0.0.1:4053"] 

replace with the below which is in the working node

seed-nodes = ["akka.tcp://sng@127.0.0.1:4053"] 

Also, please check the github link for sample https://github.com/AJEETX/Akka.Cluster

and another link https://github.com/AJEETX/AkkaNet.Cluster.RoundRobinGroup

@Rok, Kindly let me know if this was helpful or I can further try to investigate.

Ajeet Kumar
  • 687
  • 6
  • 12
  • Thank you for response. You have a good eye, but unfortunetly it was my typo in this post. All my configs have the same Actor System name. And connections (firewall wise) do happen because when I terminate COM node, lighthouse says "Connection was reset by the remote peer". There is one thing that I noticed. COM node is listening on port 37543 but lighthouse is connected on port 37546 to COM node. It could be something with that? – Rok Jul 25 '18 at 19:51
  • Another interesting thing. I downloaded all sources required for Akka (Akka, DotNetty,...) and after I replaced them with nugget I was joining cluster was successful. Well, I did upgrade frameworks (net45 -> net461 and netstandard1.6->netstandard2.0) for nuget consolidation purposes because I am having problem with loading assembly in COM environment.. – Rok Jul 25 '18 at 20:30
  • @RokB, with the logs I was not able to figure out. May be if you 'd put the code somewhere like github . Then I shall try to reproduce and see. – Ajeet Kumar Jul 26 '18 at 07:29