3

I'm new to Akka; had a local actor system talking to a remote system on another machine in our network just fine for a couple of days, and then it just stopped working for reasons I've not been able to fathom.

I'm aware that the severing of an association isn't necessarily a problem (#6 at http://petabridge.com/blog/top-7-akkadotnet-stumbling-blocks/), but in my case it's definitely not something that should be happening. I'm unable to get the result of any work that I want performed by the remote actor, and when I log onto the remote machine and look at its output, I'm not seeing any of the messages acknowledging receipt of a request that I coded it to print to its console.

This is what I'm seeing on the remote machine, as soon as I spawn the actor and deploy it to the remote machine (spawne myActorSystem actorName <@ expression @> [ SpawnOption.Deploy (Akka.Actor.Deploy (RemoteScope parsedAddress)) ]):

( See output at http://www.miloonline.net/stash/akka_remote_error.txt )

My local system's configuration is:

sprintf
    """akka {
        actor {
            provider = "Akka.Remote.RemoteActorRefProvider, Akka.Remote"
            serializers {
                wire = "Akka.Serialization.WireSerializer, Akka.Serialization.Wire"
            }
            serialization-bindings {
                "System.Object" = wire
            }
        }
        remote {
            helios.tcp {
                hostname = %s
                port = 0        // Auto-configure port
            }
        }
    }"""
    (Net.Dns.GetHostName ())

... and on the remote machine:

sprintf
    """akka {
        actor {
            provider = "Akka.Remote.RemoteActorRefProvider, Akka.Remote"
            serializers {
                wire = "Akka.Serialization.WireSerializer, Akka.Serialization.Wire"
            }
            serialization-bindings {
                "System.Object" = wire
            }
        }
        remote {
            helios.tcp {
                hostname = %s
                port = 1234
            }
        }
    }"""
    (Net.Dns.GetHostName ())

Again, these worked fine for two or three days; I've pored over my code changes in Git, and there's is nothing that would explain the sudden and persistent failure.

EDIT: Originally, I had the remote block of the HOCON configuration inside the actor block on my local machine. I moved that out of the actor block, and now the output on the remote machine has changed (I've edited the blockquote of the remote output to reflect this). I'm still seeing errors, though, and my attempts to have the remote actor do work and send back a value are still failing.

EDIT: I've moved my remote actor system over to a Windows Server machine, which has eliminated the former SocketException errors. Unfortunately, the problem I'm having now is that after my remote system communicates with my local system just fine for one session, it never works again after I terminate the remote process. Any and all succeeding attempts to establish the exact same set-up result in the familiar EndpointDisassociatedException failures, even after I reboot both machines. (As indicated above, see http://www.miloonline.net/stash/akka_remote_error.txt for the output from the remote actor system.) Is there some standardized means of terminating or instantiating an actor system that would address this issue?

MiloDC
  • 2,373
  • 1
  • 16
  • 25
  • 1
    Your log trace seems to show only one part of the connection - saying nothing more than disassociation occurred. The reason of it was probably in logs for the second machine. What could have happen is some kind of a memory leak, that caused out of memory exception (in case of F# if actor was build with a function that couldn't go through tail recursion optimization, it's possible that after some message receive a stack overflow occurred). – Bartosz Sypytkowski Jan 17 '17 at 09:44
  • I still haven't managed to pinpoint the problem, here. I've tried setting pu a remote actor system on numerous Windows Server machines, and it always works the very first time. Then, when I terminate the process running the remote actor system, and try to establish it again, it fails from then on. I get a litany of nebulous warnings and errors from the remote system the instant I attempt to spawn an actor on it. (See http://www.miloonline.net/stash/akka_remote_error.txt) The problem persists even after I reboot both machines. – MiloDC Jan 30 '17 at 20:40
  • Are you sure, you've set an `akka.cluster.auto-down-unreachable-after` timeout? It sounds like your nodes are infinitely waiting for a unreachable nodes to come up after they leave the cluster. – Bartosz Sypytkowski Jan 31 '17 at 12:59
  • I very much appreciate your input, Horusiath, but why would everything work fine once, and then fail every time afterward, because I neglected to add that setting? (I'm not saying you're wrong, I'm just asking to be informed.) My remoting set-up is very basic at this point, I'm not even using the akka.cluster library. Just instantiating a couple of systems with the HOCON settings mentioned above, then spawning a remote actor on one of them. I tried adding that `akka.cluster.auto-down-unreachable-after` setting (with the value set to 10s, then again at 2s); it made no difference. – MiloDC Jan 31 '17 at 20:35

0 Answers0