com.datastax.driver.core.exceptions.OperationTimedOutException: [xxx.xx.xx.xx/xxx.xx.xx.xx:9042] Timed out waiting for server response

Question

We are using Apache Cassandra-v3.0.9 with com.datastax.cassandra:cassandra-driver-core:3.1.3. Our application works good all the time, but once in a week we start getting the following exception from our applications:

com.datastax.driver.core.exceptions.OperationTimedOutException: [xxx.xx.xx.xx/xxx.xx.xx.xx:9042] Timed out waiting for server response
        at com.datastax.driver.core.exceptions.OperationTimedOutException.copy(OperationTimedOutException.java:44)
        at com.datastax.driver.core.exceptions.OperationTimedOutException.copy(OperationTimedOutException.java:26)
        at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
        at com.datastax.driver.core.ChainedResultSetFuture.getUninterruptibly(ChainedResultSetFuture.java:62)
        at com.datastax.driver.core.NewRelicChainedResultSetFuture.getUninterruptibly(NewRelicChainedResultSetFuture.java:11)
        at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:68)
        at com.til.cms.graphdao.cassandra.dao.generic.CassandraICMSGenericDaoImpl.getCmsEntityMapForLimitedSize(CassandraICMSGenericDaoImpl.java:2824)
        .....
Caused by: com.datastax.driver.core.exceptions.OperationTimedOutException: [xxx.xx.xx.xx/xxx.xx.xx.xx:9042] Timed out waiting for server response
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onTimeout(RequestHandler.java:770)
        at com.datastax.driver.core.Connection$ResponseHandler$1.run(Connection.java:1374)
        at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:581)
        at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:655)
        at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:367)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:145)

These applications are hitting Cassandra datacenter for read requests. The data-center consists of 5 physical servers each with 2 disks, 64 GB RAM, 40 cores, 16GB heap with G1 GC.

There was no problem with Cassandra servers as per our investigation like there was no load average/iowait increase, gc pauses or nodetool/cqlsh connectivity etc. We just started getting these exceptions in our application logs until we restarted Cassandra servers. This exception was reported randomly for different Cassandra servers in the datacenter and we had to restart each of them. In normal time each of these Cassandra server servers 10K read requests/seconds and hardly 10 write requests/seconds. When we encounter this problem read requests are dramatically affected to 2-3 K/seconds.

The replication factor of our cassandra datacenter is 3 and following is way we are making connections

Cluster.builder()
    .addContactPoints(nodes)
    .withQueryOptions(new QueryOptions().setConsistencyLevel(ConsistencyLevel.LOCAL_ONE))
    .withLoadBalancingPolicy(DCAwareRoundRobinPolicy.builder().withLocalDc(localDatacenter).build())
    .withSpeculativeExecutionPolicy(PerHostPercentileTracker.builder(13000).build())
    .build()

EDIT: We have observed before we start getting these exceptions, we getting the following WARN level exceptions in our java application.

2018-04-03 23:40:06,456 WARN [cluster1-timeouter-0] com.datastax.driver.core.RequestHandler [RequestHandler.java:805] Not retrying statement because it is not idempotent (this message will be logged only once). Note that this version of the driver changes the default retry behavior for non-idempotent statements: they won't be automatically retried anymore. The driver marks statements non-idempotent by default, so you should explicitly call setIdempotent(true) if your statements are safe to retry. See https://docs.datastax.com/en/developer/java-driver/3.1/manual/retries/ for more details.

2018-04-04 00:04:24,856 WARN [cluster1-nio-worker-2] com.datastax.driver.core.PercentileTracker [PercentileTracker.java:108] Got request with latency of 16632 ms, which exceeds the configured maximum trackable value 13000

2018-04-04 00:04:24,858 WARN [cluster1-timeouter-0] com.datastax.driver.core.PercentileTracker [PercentileTracker.java:108] Got request with latency of 16712 ms, which exceeds the configured maximum trackable value 13000

What is written in /var/log/cassandra/system.log? May it be java.lang.OutOfMemoryError: Java heap space? — alex440, Mar 28 '18 at 14:42
@alex440 - As I already mentioned in the problem statement there was no problem at Cassandra server end. There was no GC related issue - no out of memory. cqlsh/nodetool was connecting well. — Pankaj Yadav, Mar 30 '18 at 14:35
Didn't find any particular fix for this issue, we are doing periodic restarts through a cron every 4 days to avoid this issue. — Pankaj Yadav, Jun 20 '19 at 06:07

com.datastax.driver.core.exceptions.OperationTimedOutException: [xxx.xx.xx.xx/xxx.xx.xx.xx:9042] Timed out waiting for server response

0 Answers0