1

I'm trying to implement the Raft Consensus Algorithm for a Distributed System project.

I need some very quickly way to know if a server A is reachable from a server B AND A's Distributed System is up. In other words, it could happen that the A is reachable by B but the A's cloud system isn't up yet. So I think that InetAddress.getByName(ip).isReachable(timeout); isn't enough.

Since each server's stub is renamed as the server's name, I thought to get the server's registry and then check if there exists a stub with the same name of the server: if it's not the case, then skip to the next server, otherwise execute the lookup (which can take a looong time). This is part of the code:

try {
    System.out.println("Getting "+clusterElement.getId()+"'s registry");
    Registry registry = LocateRegistry.getRegistry(clusterElement.getAddress());
    System.out.println("Checking contains:");
    if(!Arrays.asList(registry.list()).contains(clusterElement.getId())) {
        System.out.println("Server "+clusterElement.getId()+" not bound (maybe down?)!");
        continue;
    }
    System.out.println("Looking up "+clusterElement.getId()+"'s stub");
    ServerInterface stub = (ServerInterface) registry.lookup(clusterElement.getId());
    System.out.println("Asking vote to "+clusterElement.getId());
    //here methods are called on stub (exploiting costum SocketFactory)
} catch (NoSuchObjectException | java.rmi.ConnectException | java.rmi.ConnectIOException e){
    System.err.println("Candidate "+serverRMI.id+" cannot request vote to "+clusterElement.getId()+" because not reachable");
} catch (UnmarshalException e) {
    System.err.println("Candidate " + serverRMI.id + " timeout requesting vote to " + clusterElement.getId());
} catch (RemoteException e) {
    e.printStackTrace();
} catch (NotBoundException e) {
   System.out.println("Candidate "+serverRMI.id+" NotBound "+clusterElement.getId());
}

Now the problem is that the server gets stuck on the contains() line, since the message Checking contains is printed while Looking up... isn't.

Why this happens? There is any way to speed up the process? This algorithm is FULL of timeouts, so any suggestion would be really appreciated!

UPDATE: After trying every possible VM property about RMI's timeouts, like: -Dsun.rmi.transport.tcp.responseTimeout=1 -Dsun.rmi.transport.proxy.connectTimeout=1 -Dsun.rmi.transport.tcp.handshakeTimeout=1 I didn't see any difference at all, even if the an exception should have been thrown at every RMI operation (since each timeout is set to 1 ms!).

The only solution that I found out for this problem is to use this RMISocketFactory reimplementation:

final int timeoutMillis = 100;            
RMISocketFactory.setSocketFactory( new RMISocketFactory()
            {
                public Socket createSocket( String host, int port )
                        throws IOException
                {
                    Socket socket = new Socket();
                    socket.setSoTimeout(timeoutMillis);
                    socket.connect(new InetSocketAddress(host, port), timeoutMillis);
                    return socket;
                }

                public ServerSocket createServerSocket( int port )
                        throws IOException
                {
                    return new ServerSocket( port );
                }
            } );
justHelloWorld
  • 6,478
  • 8
  • 58
  • 138

1 Answers1

0

It gets stuck in Registry.list(). It will time out eventually.

You'd be better off just calling lookup() without this prior step, which doesn't add any value, and investigating all the timeout options mentioned in the two properties pages linked from the RMI Home Page.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • Thanks for your reply, but I think that the java doc about the RMI property is a totally mess. Anyway, I updated the original question including the SocketFactory that I'm using to timeout stub's method calls. Do you think that this timeout is used for `loookup` operation too? – justHelloWorld Mar 31 '15 at 09:16
  • Ok, through the documentation I found this three intersting properties: `sun.rmi.transport.proxy.connectTimeout`: it seems the one that I'm looking for, since it's the timeout about creating a socket. The part that I'm not understanding is the one about HTTP. `sun.rmi.transport.tcp.handshakeTimeout`: this could be too, but I don't think so since the server is reachable and so TCP handshake is possible. `sun.rmi.transport.tcp.responseTimeout`: definitely not since it reguards methods invocation – justHelloWorld Mar 31 '15 at 09:34
  • `list()` and `lookup()` *are* remote methods. I would set *all* the timeouts. – user207421 Mar 31 '15 at 09:46
  • I know that this is a CLASSIC comment, but they don't seem to take effect at all :( Even if I use: -Dsun.rmi.transport.tcp.responseTimeout=1 -Dsun.rmi.transport.proxy.connectTimeout=1 -Dsun.rmi.transport.tcp.handshakeTimeout=1 I don't see any exception raised (though I'm testing the local host version, it would take longer than one millisecond!)...And I'm still using the SocketFactory too – justHelloWorld Mar 31 '15 at 09:55
  • You need to set those properties before executing any RMI methods, – user207421 Mar 31 '15 at 09:56
  • I'm setting them as JVM arguments when calling `java` commands...I think that's the best that I can do :D – justHelloWorld Mar 31 '15 at 09:57
  • Anyway I don't get the difference between the `SocketFactory` timeout features and the `responseTimeout`: which one "prevails" on the other? By what I understand I have to set both, but I don't totally get with which criteria. – justHelloWorld Mar 31 '15 at 11:10
  • Look at my solution (in UPDATE section). What do you think about? Why it is the only solution that works in your opinion? – justHelloWorld Mar 31 '15 at 17:33
  • connectTimeout times out the connect attempt: readTimeout operates at the server; responseTimeout operates at the client; proxyTimeout operates when the client is using HTTP tunneling; etc. This is all documented. The 1ms timeouts you are setting are absurdly short and may have been lengthened by RMI. – user207421 Mar 31 '15 at 19:23