13

I am reading this post on read operations and consistency level in Cassandra. According to this post:

For example, in a cluster with a replication factor of 3, and a read consistency level of QUORUM, 2 of the 3 replicas for the given row are contacted to fulfill the read request. Supposing the contacted replicas had different versions of the row, the replica with the most recent version would return the requested data. In the background, the third replica is checked for consistency with the first two, and if needed, the most recent replica issues a write to the out-of-date replicas.

So even with consistency level of Quorum, it is not guaranteed that you don't get a stale read. According to the above paragraph, if the third replica has the latest timestamp, the co-coordinator has already returned the latest timestamp of the two replicas it inquired. But it is not the latest since third replica has the latest timestamp.

Casey Falk
  • 2,617
  • 1
  • 18
  • 29
brain storm
  • 30,124
  • 69
  • 225
  • 393

1 Answers1

22

The QUORUM CL read does not guarantee the consistency of your data. What guarantees consistency is the following disequation

(WRITE CL + READ CL) > REPLICATION FACTOR

Translating the minimum W+R needed to guarantee data-consistency is

WRITE ALL + READ ONE
WRITE ONE + READ ALL
WRITE QUORUM + READ QUORUM

Like said in the post, if you have a Replication Factor of 3 and you wrote with CL1 surely 1 node have fresh information while other 2 might have old information. Asking cassandra a CL QUORUM read you might retrieve data from the other 2 nodes (old data), and get information back to the client. But since the coordinator sent the read request to all nodes (but waited only for 2 before sending back the response to the client) he will find out which node has the most fresh information and update other nodes.

Other, in a RF3 situation, if you write data in Quorum at least 2 nodes will have fresh information -- performing a read with CL QUORUM will invoke 2 of the 3 nodes, in this situation at least one of the two nodes have the fresh information.

Carlo Bertuccini
  • 19,615
  • 3
  • 28
  • 39
  • but again, what is sent to client by co-ordinator is old data when write with CL1 and read with quorum – brain storm Jul 30 '14 at 17:53
  • 1
    Yes, because in this situation you are not respecting the disequation with RF = 3 the CL1 = 1 and CLQUORUM = 2 ... so (1 + 2) is not bigger than 3, it's just equal to 3 – Carlo Bertuccini Jul 30 '14 at 17:56
  • Although client is getting the non-updated data in this case, if he queries again, he will get the latest update correct (because in the background an update has occured) – brain storm Jul 30 '14 at 17:59
  • The client "should" receive the fresh data, and in most cases will have the fresh information -- imagine such a case, the coordinator send the response back to client because he got the needed CL but is still waiting for other nodes that are very busy and slow to answer. Now you repeat the read, your "new" coordinator will query the same nodes and again he got a fast answer from the minimum needed to reach the CL. Here you can still get old data, because read repair of first read will happen only after your second read request. It's an "edge case", but it might happen. – Carlo Bertuccini Jul 30 '14 at 18:04
  • how do you say read repair of first read will happen only after second read request? I don't understand this – brain storm Jul 30 '14 at 18:15
  • Also read_repair_chance with default value of 0.1 does not guarantee that a read repair will happen for every read request – brain storm Jul 30 '14 at 18:18
  • The problem I'm talking about does not depend on rrc value.A coordinator before performing a rr wait for a response from all invoked nodes.Imagine RF3 cluster,(N=NODE) N1,N2 and N3 own the key you are querying. Your coordinator is N5, you query for read with CL_ONE,N5 will contact N1,N2,N3 and wait for their reply. N1 reply, so N5 will send you response because it's CL1. N5 is still waiting a response from N2 and N3 (so READ REPAIR surely didn't happen). Now you perform same query,N6 is your new coordinator. N6 contacts N1,N2,N3, N1 answer (again), so you receive(again) an old information – Carlo Bertuccini Jul 30 '14 at 18:43
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/58386/discussion-between-brain-storm-and-carlo-bertuccini). – brain storm Jul 30 '14 at 18:48
  • can you look this when you get time: http://stackoverflow.com/questions/25101982/reverse-range-query-using-astyanax – brain storm Aug 03 '14 at 16:41