python cql driver - cassandra.ReadTimeout - "Operation timed out - received only 1 responses."

Question

I am using Cassandra 2.0 with python CQL.

I have created a column family as follows:

CREATE KEYSPACE IF NOT EXISTS Identification
  WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy',
  'DC1' : 1 };

USE Identification;

CREATE TABLE IF NOT EXISTS entitylookup (
  name varchar,
  value varchar,
  entity_id uuid,
  PRIMARY KEY ((name, value), entity_id))
WITH
    caching=all
;

I then try to count the number of records in this CF as follows:

#!/usr/bin/env python
import argparse
import sys
import traceback
from cassandra import ConsistencyLevel
from cassandra.cluster import Cluster
from cassandra.query import SimpleStatement

def count(host, cf):    
    keyspace = "identification"
    cluster = Cluster([host], port=9042, control_connection_timeout=600000000)
    session = cluster.connect(keyspace)
    session.default_timeout=600000000

    st = SimpleStatement("SELECT count(*) FROM %s" % cf, consistency_level=ConsistencyLevel.ALL)
    for row in session.execute(st, timeout=600000000):
        print "count for cf %s = %s " % (cf, str(row))
    dump_pool.close()
    dump_pool.join()

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("-cf", "--column-family", default="entitylookup", help="Column Family to query")
    parser.add_argument("-H", "--host", default="localhost", help="Cassandra host")    
    args = parser.parse_args()

    count(args.host, args.column_family)

    print "fim"

The count is not that useful to me, it's just a test with an operation that takes long to complete.

Although I have defined timeout as 600000000 seconds, after less than 30 seconds I get the following error:

./count_entity_lookup.py  -H localhost -cf entitylookup 
    Traceback (most recent call last):
      File "./count_entity_lookup.py", line 27, in <module>
        count(args.host, args.column_family)
      File "./count_entity_lookup.py", line 16, in count
        for row in session.execute(st, timeout=None):
      File "/home/mvalle/pyenv0/local/lib/python2.7/site-packages/cassandra/cluster.py", line 1026, in execute
        result = future.result(timeout)
      File "/home/mvalle/pyenv0/local/lib/python2.7/site-packages/cassandra/cluster.py", line 2300, in result
        raise self._final_exception
    cassandra.ReadTimeout: code=1200 [Timeout during read request] message="Operation timed out - received only 1 responses." info={'received_responses': 1, 'data_retrieved': True, 'required_responses': 2, 'consistency': 5}

It seems the answer was found in just a replica, but this really doesn't make sense to me. Should't cassandra be able to query it anyway?

In the image bellow, it's possible to see that the amount of requests to the cluster was really low and the latency low as well. I am not sure why is this happening.

enter image description here

How many nodes to have running in this cluster? From your description it sounds like just a single node, so it's not clear why the read operation would be expecting 2 responses. If you had a 2-node cluster, only one of which was online, these results would be expected. — BrianC, Jun 02 '14 at 16:39
I have two nodes in this cluster, RF=2, write and read consistency level are ALL - both nodes are online — mvallebr, Jun 02 '14 at 17:31
About the timeout, I found out that changing timeout on cassandra server file it would be effective. Client timeout can be specified, but it doesn't override the configuration in the server. — mvallebr, Apr 14 '16 at 09:55
Regarding the slowness itself, it had to do with the size of requests to Cassandra. The data stored in the column families was too big, which was causing latency. — mvallebr, Apr 14 '16 at 09:57

danny · Accepted Answer · 2016-09-01T10:49:54.607

1

From the response:

received_responses': 1, 'data_retrieved': True, 'required_responses': 2

Data was only available on one node while the query is requiring consistency==all. Cassandra was not able to fulfill that request and timed out.

You may change the write consistency to 'ALL' if it is required that all nodes have the data.

That would ensure all read requests can be satisfied without consistency==ALL as that would be satisfied by the write request it self, though writes may fail if a node is off line.

See documentation for explanation of what each consistency level means.

LOCAL_QUORUM is what would be used to ensure majority of nodes with respect to replication factor are contacted within a DC.

edited Sep 01 '16 at 10:49

answered Sep 01 '16 at 10:14

danny

5,140
1
19
31

why was required_responses 2, if replication factor was 1 for the DC? – mvallebr Sep 01 '16 at 10:20
`ALL` means all :) Replication factor does not matter when consistency is set to all, meaning all nodes need to be contacted. Perhaps you meant to use `QUORUM` to contact majority of nodes with respect to replication factor. – danny Sep 01 '16 at 10:45
All nodes that contain the data - aka replication factor. If you have 1000 nodes in your cluster with replication factor 3, 3 nodes should be contacted for consistency ALL, not 1000. – mvallebr Sep 01 '16 at 13:14
All nodes in all DCs, yes. Is the other node in another DC and has the key? `all` would catch it. – danny Sep 01 '16 at 13:39
I had just 1 DC – mvallebr Sep 05 '16 at 16:51
1

According to Cassandra, there is another node in the cluster which has the data. Query requires `ALL`, one node could not respond, query failed. Where that node is, do not know. – danny Sep 08 '16 at 10:55

python cql driver - cassandra.ReadTimeout - "Operation timed out - received only 1 responses."

1 Answers1