Cassandra - Data not replicating across all nodes

Question

I'm running a query across all three nodes. One of the queries results in displaying ten rows, while the same query is showing two rows on the other two.

The replication factor is set to 3:

keyspace_name      | durable_writes | replication
--------------------+----------------+-------------------------------------------------------------------------------------

table name |           True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'}

Nodetool Netstats:

nodetool netstats
Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 16519
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name                    Active   Pending      Completed   Dropped
Large messages                  n/a         1             13         4
Small messages                  n/a         0         320422         4
Gossip messages                 n/a         0       12972040       470

Nodetool repair has been ran across all of the nodes.

@CarlosMonroyNieblas The consistency level was set to one. I've set it to quorum and this has correct the problem. The Java code should be adjusted to set the consistency level to quorum or is this something I can trigger server side? Thanks for the help! — Brian Johnson, Oct 01 '19 at 02:20
Also, this is only occurring in this particular environment. The other environments have the same number of rows on each node, with a consistency level of one. Any insight there? — Brian Johnson, Oct 01 '19 at 02:29
You should set consistency level in your code so that every request will go on same consistency from the application. — LetsNoSQL, Oct 01 '19 at 04:28

score 0 · Accepted Answer · answered Oct 01 '19 at 05:05

0

Based on your comment, the issue could be prevented using a consistency level of QUORUM or higher. One thing to consider is that increasing consistency may have an impact on performance and on resiliency. For instance, using a consistency level of ALL will ensure to always have accurate data, but if there is an issue with one of the instances of the cluster, the queries will fail as the consistency level won't be satisfied. The best consistency level will depend on your use case and your SLA's.

How often have you executed repairs (nodetool repair) on your cluster? Repairs will address the root cause for the different data retrieved from each node.

answered Oct 01 '19 at 05:05

Carlos Monroy Nieblas

2,225
2
16
27

I'm running `nodetool repair` on demand currently. This would be in my lower environments only. Do you believe setting up a cron to run nodetool repair on the nodes periodically would suffice for non-production environments? – Brian Johnson Oct 01 '19 at 17:31
I would recommend that you take a look at cassandra-reaper (http://cassandra-reaper.io/) it is an opensource tool where you can define schedules and frequency of the repairs, as well as it allows to schedule them up to the keyspace level. That way you can split the task into small subtasks that will take less time to complete. – Carlos Monroy Nieblas Oct 01 '19 at 19:25

Cassandra - Data not replicating across all nodes

1 Answers1