1

I am very new to Cassandra and I am wondering, is it possible for Cassandra to return an inconsistent value?

For example, say we have six node cluster.

LOCAL_QUORUM = (replication_factor/2) + 1

This would give us a Local Quorum of 4. So for a simple write, four of six nodes have to respond, which means that four nodes would have the most recent value.

From my understanding, the two nodes that were not updated eventually get updated through Gossip Protocol.

If so what happens if a client reads from one of the two nodes that were not updated before the protocol occurs? Are they at risk of getting a stale value?

How does read repair play into all this?

*Also a quick side note. Not that you would ever do this, but if you set the replication factor equal to the consistency level, does that essentially operate the same as 2PC (two phase commit) on the back?

2 Answers2

0

is it possible for Cassandra to return an inconsistent value?

The answer is: Yes it is.

It depends on how you are going to set read/write consistency level.

If so what happens if a client reads from one of the two nodes that were not updated before the protocol occurs? Are they at risk of getting a stale value?

If you set the consistency level of read operations to ONE or TWO then there is still a possibility/risk of getting stale value. Why?: because cassandra will return value to client if it get the response from specified number of nodes.

Cassandra is very flexible you can configure cassandra as your application needs. To maintain a Strong level of consistency you can always follow this rules:

Reliability of read and write operations depends on the consistency used to verify the operation. Strong consistency can be guaranteed when the following condition is true:

R + W > N

where R is the consistency level of read operations

W is the consistency level of write operations

N is the number of replicas

For more to understand checkout this: consistent read and write operations

How does read repair play into all this?

In read repair, Cassandra sends a digest request to each replica not directly involved in the read. Cassandra compares all replicas and writes the most recent version to any replica node that does not have it. If the query's consistency level is above ONE, Cassandra performs this process on all replica nodes in the foreground before the data is returned to the client. Read repair repairs any node queried by the read. This means that for a consistency level of ONE, no data is repaired because no comparison takes place. For QUORUM, only the nodes that the query touches are repaired, not all nodes.

Check this link for more details:

  1. Read Repair: repair during read path

  2. Repairing nodes

Community
  • 1
  • 1
MD Ruhul Amin
  • 4,386
  • 1
  • 22
  • 37
0

Welcome to Cassandra world

is it possible for Cassandra to return an inconsistent value?

Yes, Cassandra by nature has an "eventually consistent" approach, so if you set your consistency level for a read with ANY or ONE, the risk to have an inconsistent value returned increases. You can increase this setting to ALL to ensure that the information will be consistent, but you'll sacrifice performance and resiliency. The levels used in the application will depend on your use case.

For example, say we have six node cluster.

LOCAL_QUORUM = (replication_factor/2) + 1

Replication factor is independent of the amount of nodes in the cluster, the thumb rule is that you have is that replication factor should not be greater than the amount of nodes.

Assuming that you are using a replication factor of 6 in the 6 nodes cluster:

This would give us a Local Quorum of 4. So for a simple write, four of six nodes have to respond, which means that four nodes would have the most recent value.

From my understanding, the two nodes that were not updated eventually get updated through Gossip Protocol.

The mechanism to ensure that the replication factor is fulfilled is with Hinted handoffs; the gossip protocol is used by the nodes to report the state of the node (from itself and from other nodes), some of those states are "up", "down", "healthy", "joining", "leaving", etc.

If so what happens if a client reads from one of the two nodes that were not updated before the protocol occurs? Are they at risk of getting a stale value?

You will want to read about the read path of Cassandra; as a tl dr, this will depend on the replication factor as well as the consistency level for the read operation. You will also be able to decrease the risk of inaccurate data sacrificing resiliency and performance.

Carlos Monroy Nieblas
  • 2,225
  • 2
  • 16
  • 27