0

I have a web service application using Cassandra 2.0 and Datastax java driver 2.0.2. I sometimes get the stacktrace below when trying to write to/read from database, especially if the application has been sitting there for a while (like overnight). This error usually goes away when I retry, however, sometimes it persists and I have to restart the web app to get rid of the error.

I wonder if this is some sort of "stale connection" issue. However, the Datastax java driver documentation indicates it is supposed to keep the connection alive.

I did a google search on the error message and only two (!) hits were given by google. They are related. This is the answer in one of the google result:

Sylvain Lebresne Apr 2 You're running into https://datastax-oss.atlassian.net/browse/JAVA-250. We'll fix it soon hopefully (I have some half-finished patch that I need to finish), but currently, if you restart a whole cluster without doing queries during the restat, it can sometimes happen that you'll get this before the cluster properly reconnect. In the meantime and as a workaround, you can always make sure to run a few trivial queries while you're doing the cluster restart to avoid it.

However this does not look like my scenario because we are not restarting the cluster at all. I wonder if anyone has some insights about this error?

Stacktrace:

com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: ec2-54-197-xxx-xxx.compute-1.amazonaws.com/54.197.xxx.xxx:9042 (com.datastax.driver.core.ConnectionException: [ec2-54-197-xxx-xxx.compute-1.amazonaws.com/54.197.xxx.xxx:9042] Write attempt on defunct connection))
at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256)
at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:172)
at com.datastax.driver.core.SessionManager.execute(SessionManager.java:92)
Erick Ramirez
  • 13,964
  • 1
  • 18
  • 23

1 Answers1

0

I have what I believe is the exact same issue (Write attempt on defunct connection) on my development machine intermittently.

It seems to happen when my dev machine goes to sleep while the server is up. Obviously there's no power management in the AWS cluster you're running, but it gives you a hint - the key is that something is breaking your control connection or intermittently preventing network connectivity between your hosts.

You should see the reconnection thread in your logs: 21:34:51.616 [Reconnection-1] ERROR c.d.driver.core.ControlConnection - [Control connection] Cannot connect to any host, scheduling retry in 2000 milliseconds

The next request after this will always succeed in my experience.

TL; DR - check for networking issues or any intermittent shutdown of servers that could break the control connection. The driver should do a better job of re-establishing broken control connections, sounds like they're working on it for JAVA-250