Why do row counts per node differ for a 5-node cluster with a replication factor of 3?

Question

I have 5 nodes of machines connected in a Cassandra distributed data system. I have setup the replication factor as 3.

I have understood that for a replication of 3, the data will be spread across 3 nodes based on the coordinator nodes availability. When I check for individual nodes, the row counts are differing. I have transferred some 100k of rows from csv to cassandra. Does this mean, I have to take row counts for all nodes all together to get the results ? I am using dsbulk for checking the row count.

Am I missing something here?

A friendly reminder that this site is for getting help with coding, algorithm, or programming language problems so I voted to have your post moved to [DBA Stack Exchange](https://dba.stackexchange.com/questions/ask?tags=cassandra). For future reference, you should post DB admin/ops questions on https://dba.stackexchange.com/questions/ask?tags=cassandra. Cheers! — Erick Ramirez, Aug 22 '23 at 04:59

score 2 · Accepted Answer · answered Aug 19 '23 at 23:10

With 5 nodes, an RF of 3, and 100k rows loaded of raw data - assuming no dropped mutations, then there is a grand total of 300k rows of data spread across the 5 nodes. (the RF of 3 x 100k).

You mention that the data is spread based on the coordinator nodes availability - but it is based on the consistent hash of the partition key of the row, as to which nodes hold the replicas.

The likelihood is that when using DSBulk you are using the default consistency level of local_one (https://docs.datastax.com/en/dsbulk/docs/reference/driver-options.html#datastaxJavaDriverBasicRequestConsistency), and that there were dropped mutations on the load. Change the consistency level to at least local_quorum / repair the cluster to bring it back to a consistent state.

Thank for the response. I did full nodetool repair. But even after that, there is a slight variation row count i.e., 0.04% between the nodes. Is this acceptable? What is the threshold for acceptable mutation rate? Some tables have even billion rows. So it is difficult to load them again. — prasanna, Aug 20 '23 at 03:29

score 2 · Answer 2 · answered Aug 21 '23 at 12:04

What is your exact dsbulk count command look like? Also, what is the output of running ./dsbulk --version & via CQLSH, DESCRIBE KEYSPACE your_keyspace_name;

You would need something like below,

./dsbulk count -k keyspace_name -t table_name <other configs> --datastax-java-driver.basic.request.consistency LOCAL_QUORUM

score 2 · Answer 3 · answered Aug 22 '23 at 04:59

The row count between nodes will never be exactly the same because of the way data is distributed around the cluster.

In a 5-node data centre, each node will roughly own 20% of the data. The keyword being "roughly" because the number of tokens owned (token ranges) by each node is not absolutely the same -- some nodes will have a slightly larger token range while some nodes have slightly less, though the differences will be tiny by percentage.

On top of that, each record is distributed randomly across nodes in the cluster using an algorithm that hashes the partition key into a token value. The random distribution of the data again introduces a level of variance so each node doesn't necessarily have exactly the same amount of data.

With just 100K partitions, the data will not get distributed equally as you would expect. It is not until you have billions of partitions will you see closer to equal distribution.

Remember that for the default Murmur3Partitioner, the number of possible hash values (tokens) for partition keys ranges from -2⁶³ to 2⁶³-1 (or roughly 2¹²⁸) -- that's a very, VERY large number. By comparison, 100K is not even close to 1% of that. Cheers!

Thanks for the explanation. So according to my understanding, if the data is not equal across all the nodes, it means, we can assume the variance will be always there and by default the data is inconsistent across all nodes. — prasanna, Aug 23 '23 at 18:20
No, don't confuse (1) unbalanced data distribution with (2) replica consistency because (3) those are two completely different things unrelated to each other. Terms like "inconsistent" have a different meaning in Cassandra to what you're referring to. Cheers! — Erick Ramirez, Aug 24 '23 at 01:20

Why do row counts per node differ for a 5-node cluster with a replication factor of 3?

3 Answers3