0

Datastax driver: We are running into a serious issue of holding up lot of connections. We have no clue where these connections are created.

I will first tell you about my context:

  1. We use datastax driver to connect to Cassandra.
  2. We just create one session for an app and use it for the app liefetime.
  3. We create keyspace to isolate each dev environment. So people can create keyspace when they require.

Frequently the Cassandra server goes down. When we looked at the issue when it is slow or down, the cluster seems to have lot of incoming connections from several dev environment.

Ideally our expectation is there should be 6-10 connections from a single dev environment. But we see a few thousands.

We see these kind of logs.

-- [com.datastax.driver.core.ControlConnection] (Cassandra Java Driver worker-6315) [Control connection] Connection error while refreshing schema Write attempt on defunct connection)

-- [com.datastax.driver.core.ControlConnection] (Cassandra Java Driver worker-6314) [Control connection] Refreshing schema for XXXX 

We see lots of Cassandra Java Driver worker threads.

XXXX is some random keyspace in the cluster.

Any idea to where the problem could be?

Alexis Wilke
  • 19,179
  • 10
  • 84
  • 156
  • I do not know of the Datastax driver as I'm using my own (libQtCassandra in C++). However, what you describe sounds like an "out of resources" case which simply breaks the node. It seems to me that the idea of using Cassandra is to have multiple nodes, you may need to add nodes? Maybe you should rewrite your code too. In my case I create one connection per client and it works as expected on that end. I have had many *out of memory* errors, though. – Alexis Wilke Sep 27 '14 at 04:04
  • 1
    I think we need to see some code. – Don Branson Sep 27 '14 at 18:10
  • 1
    Do you use custom pooling options (`Cluster.Builder.withPoolingOptions`) or the default ones? Also, could you look at the state of the TCP connections at the O/S level (`LISTEN`, `CLOSE_WAIT`, etc.)? If you use Linux, `lsof` will show you that. – Olivier Michallat Sep 29 '14 at 07:34
  • We use default pooling options. Will try to dig in with lsof as you said. lets see if I get any clue with it.. – Sathish Kannan Oct 06 '14 at 23:16

0 Answers0