Starting first node in Heo4j HA cluster fails even when allowed to create cluster

Question

Whilst trying to diagnose a different issue with my cluster I tried isolating my environments to force elections events. When starting nodes in isolation though my app failed to start with this exception:

Caused by: java.util.concurrent.TimeoutException: null
    at org.neo4j.cluster.statemachine.StateMachineProxyFactory$ResponseFuture.get(StateMachineProxyFactory.java:300) ~[neo4j-cluster-2.0.1.jar:2.0.1]
    at org.neo4j.cluster.client.ClusterJoin.joinByConfig(ClusterJoin.java:158) ~[neo4j-cluster-2.0.1.jar:2.0.1]
    at org.neo4j.cluster.client.ClusterJoin.start(ClusterJoin.java:91) ~[neo4j-cluster-2.0.1.jar:2.0.1]
    at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:503) ~[neo4j-kernel-2.0.1.jar:2.0.1]
    ... 59 common frames omitted

My configuration is set with a 60 second join timeout (ha.cluster_join_timeout) and such that the individual nodes can initialize the cluster (ha.allow_init_cluster).

Looking at a truncated chunk of code from the ClusterJoin class I believe that after some negative cases the code will either loop attempting again to connect, or that the current node will create a new cluster.

private void joinByConfig() throws TimeoutException
{
    while( true )
        {
            if (config.getClusterJoinTimeout() > 0)
            {
                try
                {
                    console.log( "Joined cluster:" + clusterConfig.get(config.getClusterJoinTimeout(), TimeUnit.MILLISECONDS ));
                    return;
                }
                catch ( InterruptedException e )
                {
                    console.log( "Could not join cluster, interrupted. Retrying..." );
                }
                catch ( ExecutionException e )
                {
                    logger.debug( "Could not join cluster " + this.config.getClusterName() );
                    if ( e.getCause() instanceof IllegalStateException )
                    {
                        throw ((IllegalStateException) e.getCause());
                    }

                    if ( config.isAllowedToCreateCluster() )
                    {
                        // Failed to join cluster, create new one
                        console.log( "Could not join cluster of " + hosts.toString() );
                        console.log( format( "Creating new cluster with name [%s]...", config.getClusterName() ) );
                        cluster.create( config.getClusterName() );
                        break;
                    }

                    console.log( "Could not join cluster, timed out. Retrying..." );
                }
            }

However a TimeoutException is not one of these cases and in fact the joinByConfig method also throws the TimeoutException. The StateMachineProxyFactory$ResponseFuture class (which implements Future) throws a TimooutException when time has been waited and no State Machine message has been received.

public synchronized Object get( long timeout, TimeUnit unit )
            throws InterruptedException, ExecutionException, TimeoutException
    {
        if ( response != null )
        {
            getResult();
        }

        this.wait( unit.toMillis( timeout ) );

        if ( response == null )
        {
            throw new TimeoutException();
        }
        return getResult();
    }

Should it be the case that when joining a cluster has timed out, and if configured to intialise a cluster that the TimoutException should not be propagated and that a new cluster should be initialised? If that is not the case, do clustered servers always have to be started up in unison?

Starting first node in Heo4j HA cluster fails even when allowed to create cluster

0 Answers0

Linked