9

We have 10 Cassandra nodes in production running Cassandra-2.1.8. We recently upgraded to 2.1.8 version. Previously we were using only 3 nodes running Cassandra-2.1.2. First we upgraded the initial 3 nodes from 2.1.2 to 2.1.8 (following the procedure as described in Upgrading Cassandra). Then we added 7 more nodes running Cassandra-2.1.8 in cluster. Then we started our client programs. For first few hours everything worked fine, but after few hours, we saw some errors in client program logs like

Thread-0 [29/07/15 17:41:23.356] ERROR  com.cleartrail.entityprofiling.engine.InterpretationWriter - Error:com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: [/172.50.33.161:9041, /172.50.33.162:9041, /172.50.33.95:9041, /172.50.33.96:9041, /172.50.33.165:9041, /172.50.33.166:9041, /172.50.33.163:9041, /172.50.33.164:9041, /172.50.33.42:9041, /172.50.33.167:9041] - use getErrors() for details)
       at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
       at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:259)
       at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:175)
       at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
       at com.cleartrail.entityprofiling.engine.InterpretationWriter.WriteInterpretation(InterpretationWriter.java:430)
       at com.cleartrail.entityprofiling.engine.Profiler.buildProfile(Profiler.java:1042)
       at com.cleartrail.messageconsumer.consumer.KafkaConsumer.run(KafkaConsumer.java:336)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: [/172.50.33.161:9041, /172.50.33.162:9041, /172.50.33.95:9041, /172.50.33.96:9041, /172.50.33.165:9041, /172.50.33.166:9041, /172.50.33.163:9041, /172.50.33.164:9041, /172.50.33.42:9041, /172.50.33.167:9041] - use getErrors() for details)
       at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:102)
       at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:176)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
       at java.lang.Thread.run(Thread.java:745)

Now, I double checked the Firewall (as suggested in few posts), ports, timeouts in client as well as nodes and they all are correct.

I am also not closing the connection anywhere in between. I am using batch queries with batch size of 1000 and the queries are update queries updating counters in my table with three columns

entity , twfwv , cvalue

where entity and twfwv columns are text and primary key and cvalue is counter column.

I even restarted all my nodes (because this trick helped me in my dev environment when I faced the same exception) but its not helping. Please suggest what can be the probable problem here.

MMCXCVII
  • 23
  • 1
  • 1
  • 7
abi_pat
  • 572
  • 2
  • 12
  • 35
  • `NoHostAvailableException` is a generic error that means all nodes tried by the driver failed. It would be helpful to see what the individual errors are, for that you need to catch the exception in your code and inspect the `getErrors()` field. On a side note, which driver version are you using? Recent versions should show the first 3 errors in the main message. – Olivier Michallat Jul 29 '15 at 16:15
  • I am using com.datastax.cassandra cassandra-driver-core 2.1.2 version. Moreover I will try to get the exact error by getErrors() method. I will post the output once I get it. – abi_pat Jul 29 '15 at 16:20
  • 2
    The error message was improved in 2.1.4 (ticket: [JAVA-409](https://datastax-oss.atlassian.net/browse/JAVA-409)). The latest version is 2.1.7.1. – Olivier Michallat Jul 29 '15 at 16:23
  • Yes I upgraded my client version and will be using 2.1.7.1 only. Thanks for the update. – abi_pat Jul 29 '15 at 16:32

2 Answers2

6

My issue was resolved by checking the errors collection of NoHostAvailableException as advised by Olivier Michallat in the comments. For me it was the protocol version on the cluster configuration. Mine was null, setting it to 3 fixed the problem.

Community
  • 1
  • 1
bitsprint
  • 897
  • 1
  • 11
  • 19
3

My issue was resolved by removing/using a property to set or unset the custom load balancing TokenAwarePolicy my connection was using, and relying on the default.

Specifically, I was trying to get a local spring boot app talking to a single dockerized Cassandra instance.

        Cluster.Builder builder = Cluster.builder()
            .addContactPoints(cassandraProperties.getHosts())
            .withPort(cassandraProperties.getPort())
            .withProtocolVersion(ProtocolVersion.V4)
            .withRetryPolicy(new LoggingRetryPolicy(DefaultRetryPolicy.INSTANCE))
            .withCredentials(cassandraProperties.getUsername(), cassandraProperties.getPassword())
            .withCodecRegistry(codecRegistry);

        if (loadBalanced) {
            builder.withLoadBalancingPolicy(
                new TokenAwarePolicy(DCAwareRoundRobinPolicy.builder().withLocalDc(localDc).build()));
        }
Matt
  • 76
  • 4