Cassandra throwing NoHostAvailableException after 5 minutes of high IOPS run

Question

I'm using datastax cassandra 2.1 driver and performing read/write operations at the rate of ~8000 IOPS. I've used pooling options to configure my session and am using separate session for read and write each of which connect to a different node in the cluster as contact point. This works fine for say 5 mins but after that I get a lot of exceptions like :

Failed with: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /10.0.1.123:9042 (com.datastax.driver.core.TransportException: [/10.0.1.123:9042] Connection has been closed), /10.0.1.56:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout while trying to acquire available connection (you may want to increase the driver number of per-host connections)))

Can anyone help me out here on what could be the problem?

The exception asks me to increase number of connections per host but how high a value can I set for this parameter ? Also I'm not able to set CoreConnectionsPerHost beyond 2 as it throws me exception saying 2 is the max.

This is how I'm creating each read / write session.

   PoolingOptions poolingOpts = new PoolingOptions();
           poolingOpts.setCoreConnectionsPerHost(HostDistance.REMOTE, 2);
           poolingOpts.setMaxConnectionsPerHost(HostDistance.REMOTE, 200);
           poolingOpts.setMaxSimultaneousRequestsPerConnectionThreshold(HostDistance.REMOTE, 128);
           poolingOpts.setMinSimultaneousRequestsPerConnectionThreshold(HostDistance.REMOTE, 2);
           cluster = Cluster
             .builder()
             .withPoolingOptions( poolingOpts )
             .addContactPoint(ip)
             .withRetryPolicy( DowngradingConsistencyRetryPolicy.INSTANCE )
             .withReconnectionPolicy( new ConstantReconnectionPolicy( 100L ) ).build();
           Session s =  cluster.connect(keySpace);

score 6 · Accepted Answer · edited Jan 17 '17 at 15:12

Your problem might not actually be in your code or the way you are connecting. If you say the problem is happening after a few minutes then it could simply be that your cluster is becoming overloaded trying to process the ingestion of data and cannot keep up. The typical sign of this is when you start seeing JVM garbage collection "GC" messages in the cassandra system.log file, too many small ones batched together of large ones on their own can mean that incoming clients are not responded to causing this kind of scenario. Verify that you do not have too many of these event showing up in your logs first before you start to look at your code. Here's a good example of a large GC event:

INFO [ScheduledTasks:1] 2014-05-15 23:19:49,678 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 2896 ms for 2 collections, 310563800 used; max is 8375238656

When connecting to a cluster there are some recommendations, one of which is only have one Cluster object per real cluster. As per the article I've linked below (apologies if you already studied this):

Use one cluster instance per (physical) cluster (per application lifetime)
Use at most one session instance per keyspace, or use a single Session and explicitly specify the keyspace in your queries
If you execute a statement more than once, consider using a prepared statement
You can reduce the number of network roundtrips and also have atomic operations by using batches

http://www.datastax.com/documentation/developer/java-driver/2.1/java-driver/fourSimpleRules.html

As you are doing a high number of reads I'd most definitely recommend using setFetchSize also if its applicable to your code

http://www.datastax.com/documentation/developer/java-driver/2.1/common/drivers/reference/cqlStatements.html

http://www.datastax.com/documentation/developer/java-driver/2.1/java-driver/reference/queryBuilderOverview.html

For reference heres the connection options in case you find it useful

http://www.datastax.com/documentation/developer/java-driver/2.1/common/drivers/reference/connectionsOptions_c.html

Hope this helps.

I have exactly the same probleme but I have no access to the code. I m using Hadoop over Cassandra, and I do not know how to pass the fetchSzie to be used by the driver ( which is used by Cassandra implementation of Hadoop. Do you know if there is a property that I could use to set the fetchSize instead of doing it by code?? — user1314742, Apr 20 '15 at 12:47
No unfortunately it needs to be either set in the code or the code written to pull this fetchsize from a property file which can be configured at runtime — markc, Nov 11 '15 at 21:36
I found it was a problem with the 2.0.10 release of cassandra http://stackoverflow.com/a/30322318/1314742 — user1314742, Mar 26 '16 at 11:29

Cassandra throwing NoHostAvailableException after 5 minutes of high IOPS run

1 Answers1