1

I have a trouble with the rows counting of very huge table in Cassandra DB.

Simple statement:

SELECT COUNT(*) FROM my.table;

Invokes the timeout error:

OperationTimedOut: errors={}, ...

I have increased client_timeout in ~/.cassandra/cqlshrc file:

[connection]
client_timeout = 900

Statement is running this time and invokes OperationTimeout error again. How can I count rows in table?

Viktor M.
  • 4,393
  • 9
  • 40
  • 71
  • The timeout might be on the server side. Maybe this [thread](http://stackoverflow.com/questions/15238970/rpc-timeout-in-cqlsh-cassandra) helps? – Ralf Feb 26 '16 at 08:46
  • SELECT COUNT(*) FROM my.table limit << upper limit >> e.g. play around with increasing the upper limit to get actua count of the records. – Gomes Feb 29 '16 at 01:35

1 Answers1

2

You could count multiple times by using split token ranges. Cassandra uses a token range from -2^63 to +2^63-1. So by splitting up this range you could do queries like that:

select count(*) from my.table where token(partitionKey) > -9223372036854775808 and token(partitionKey) < 0;
select count(*) from my.table where token(partitionKey) >= 0 and token(partitionKey) < 9223372036854775807;

Add those two counts and you'll have the total count. If those querys still not go through you can split them again into smaller token ranges.

Check out this tool, which does basically exactly that: https://github.com/brianmhess/cassandra-count

HashtagMarkus
  • 1,641
  • 11
  • 20