14

I have a three nodes Cassandra Cluster and I have created one table which has more than 2,000,000 rows.

When I execute this (select count(*) from userdetails) query in cqlsh, I got this error:

OperationTimedOut: errors={}, last_host=192.168.1.2

When I run count function for less row or with limit 50,000 it works fine.

Luis González
  • 3,199
  • 26
  • 43
Kaushal
  • 3,237
  • 3
  • 29
  • 48

7 Answers7

14

count(*) actually pages through all the data. So a select count(*) from userdetails without a limit would be expected to timeout with that many rows. Some details here: http://planetcassandra.org/blog/counting-key-in-cassandra/

You may want to consider maintaining the count yourself, using Spark, or if you just want a ball park number you can grab it from JMX.

To grab from JMX it can be a little tricky depending on your data model. To get the number of partitions grab the org.apache.cassandra.metrics:type=ColumnFamily,keyspace={{Keyspace}},scope={{Table​}},name=EstimatedColumnCountHistogram mbean and sum up all the 90 values (this is what nodetool cfstats outputs). It will only give you the number that exist in sstables so to make it more accurate you can do a flush or try to estimate number in memtables from the MemtableColumnsCount mbean

For a very basic ballpark number you can grab the estimated partition counts from system.size_estimates across all the ranges listed (note that this is only number on one node). Multiply that out by number of nodes, then divided by RF.

Chris Lohfink
  • 16,150
  • 1
  • 29
  • 38
  • Please, Can you tell me in some detail to how to get it from JMX ? – Kaushal Apr 01 '15 at 15:41
  • +1 upvoted. Another alternative (to hold the `count` on the amount of records) would be to use Cassandra's neat distributed counters for that (in a seperate CF, simply increment it on every insert). You'll need to retrieve the current amount of rows (iterate through it in code for example) to provide the initial seed amount when setting this up. – ChristopheD Apr 01 '15 at 21:58
  • Kinda old for update but after https://issues.apache.org/jira/browse/CASSANDRA-9107 you can use the `org.apache.cassandra.metrics:type=ColumnFamily,keyspace={{Keyspace}},scope={{Table​}},name=EstimatedRowCount` (EstimatedPartitionCount in newer versions) to include the number in memtables and will do a better job of getting an approx. row count. – Chris Lohfink Nov 19 '15 at 14:04
11

You can also increase timeout in the cqlsh command, e.g.:

cqlsh --request-timeout 120 myhost
Sergey Shcherbakov
  • 4,534
  • 4
  • 40
  • 65
  • this does not work for `select` statements. Check here: https://docs.datastax.com/en/dse/5.1/cql/cql/cql_reference/cqlsh_commands/cqlsh.html – shruti1810 Feb 24 '22 at 10:08
4

To change the client timeout limit in Apache Cassandra, there are two techniques:

Technique 1:Modify the cqlshrc file.

Technique 2: Open the program cqlsh and modify the time specified using the client_timeout variable.

For details to accomplish please refer the link: https://playwithcassandra.wordpress.com/2015/11/05/cqlsh-increase-timeout-limit/

Sashank Bhogu
  • 659
  • 7
  • 6
2

if you use cqlsh: open the script in editor and find all words "timeout". Change default value from 10 to 60 and save the script.

Matthew Murdoch
  • 30,874
  • 30
  • 96
  • 127
2

I'm using Cassandra 3.4 and cqlsh to get record counts. It appears that there has been a code change in 3.4. cqlsh just calls cqlsh.py. Inside of cqlsh.py there is a DEFAULT_REQUEST_TIMEOUT_SECONDS variable that defaults to 10 (seconds). I changed it to 3600 (1 hour) and now my SELECT count(*) queries work.

Misa Lazovic
  • 2,805
  • 10
  • 32
  • 38
Kyle Burke
  • 141
  • 1
  • 4
0

having the same problem as you above if i do a count for a day, but as a work around, I split the count into two requests (12hours + 12hours), such as below.

cqlsh:jw_schema1> select count(*) from flight_statistics where insert_time >= '2015-08-20 00:00:00' and insert_time <= '2015-08-20 11:59:59' ALLOW FILTERING;

 count
-------
 42528

(1 rows)
cqlsh:jw_schema1> select count(*) from flight_statistics where insert_time >= '2015-08-20 12:00:00' and insert_time <= '2015-08-20 23:59:59' ALLOW FILTERING;

 count
-------
 86580

(1 rows)
cqlsh:jw_schema1> 
Jasonw
  • 5,054
  • 7
  • 43
  • 48
0

I'm using Cassandra 3.11 and cqlsh to get record counts. My table is about 40,000,000 rows and i was forced with this problem. my problem solved with two changes:

first is change all timeout configs in 'cassandra.yaml' on all node:

# 3,600,000 is one hour in ms
read_request_timeout_in_ms: 3600000
range_request_timeout_in_ms: 3600000
write_request_timeout_in_ms: 3600000
counter_write_request_timeout_in_ms: 3600000
cas_contention_timeout_in_ms: 3600000
truncate_request_timeout_in_ms: 3600000
request_timeout_in_ms: 3600000
slow_query_log_timeout_in_ms: 3600000

then restart cassandra on all node.

and second is running 'cqlsh' with specify timeout like below:

cqlsh --request-timeout=3600000 <myhost>
Mohammad Rahmati
  • 81
  • 1
  • 1
  • 6