1

Scenario:

  1. Client sends a write request to a coordinator node
  2. Replication factor is 3 and Read/Write Consistency level is QUORUM.
  3. Coordinator sends the request to nodes A, B and C. Data is committed to node A, but nodes B and C go down immediately after receiving the request from the coordinator. Coordinator will send a timeout exception to the client since it has not received an ack from nodes B and C within the allotted time. The data on node A is now inconsistent with the data on nodes B and C. Based on my understanding nodes B and C will be updated with the value on node A during read repair. So we had a timeout exception here, but the new value has been eventually written to all the nodes.

There could be other timeout exceptions where the new data has not been written to any of the nodes.

So it appears that the developer is expected to handle the timeout exception in the code which may not be straightforward in all cases(because the new value may be written in some cases and not in others and the developer has to check for that during a retry after the timeout).

I'm just learning Cassandra. So if my understanding is not correct, please correct me.

Some of you may say that this happens in a relational DB too, but it's a rare occurrence there since it's not a distributed system.

Here are some articles that I found, but it does not address my question specifically.

What happens if a coordinator node goes down during a write in Apache Cassandra?

https://www.datastax.com/blog/2012/08/when-timeout-not-failure-how-cassandra-delivers-high-availability-part-1

Sara
  • 11
  • 1

2 Answers2

1

If the data is written you it is consistent, even if node B and C didnot sent the ACKT : When the data is received by a node, it first goes to a commit log and if the node crashes, then it will replay the mutation as soon as it will starts up again.

As the second article said, it is more like a InProgressException than a TimedOutException.

On client side if you have a TimedOutException you are not 100% sure that the data was written, but it could be.

For your case, if the write as received by node B and C, even if they didnot sent ACK, the data is consistent. even if just one of the 2 nodes did, the data is consistent too due to QUORUM use.

Cluster side, there are several mechanisms that can hep Cassandra being more consistent : hinted handoff, read repair, and repair.

For better understanding, maybe worth taking a look at :

write path :

https://docs.datastax.com/en/cassandra-oss/2.1/cassandra/dml/dml_write_path_c.html

hinted handoff:

https://docs.datastax.com/en/cassandra-oss/2.1/cassandra/dml/dml_about_hh_c.html

read repair :

https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/operations/opsRepairNodesReadRepair.html

Saifallah KETBI
  • 303
  • 1
  • 9
  • When I mentioned that 'nodes B and C go down immediately after receiving the request from the coordinator', I meant to say that the data has not been written/committed to nodes B and C. If nodes B and C do not recover within the timeout period, the coordinator is going to send an exception to the client even though the data has been committed to node A and that is not correct in my opinion. Since the write consistency level is QUORUM and that was not satisfied, Cassandra should ideally have a good mechanism to rollback the data since it's sending an exception to the client. – Sara Aug 26 '20 at 17:27
  • Short of space for comment. So forced to create a new comment. On the other hand, if Cassandra is confident that the data on nodes A, B and C will eventually be consistent, then it should not return an exception to the client indicating that no retry action is required from the client. Note: Whether the exception is called InProgressException or TimedOutException is more of semantics. An exception tells the the client that the write may not be successful. – Sara Aug 26 '20 at 17:29
  • That is the point, there is no roll back and no transaction and 100% sure for this kind of situation, in that case you can consider your write as failure, but not failure as acid database, because the data might be written, the vision is different while thinking about eventual consistency. For the node who received a write, it goes straight to commit long, unless it crashed before that in that case maybe considering the node as didnot received the data. Otherwise, it will be written. – Saifallah KETBI Aug 27 '20 at 00:24
0

Thanks for the response. It still does not help answer the question from an end user/developer perspective since I need to write the code to handle the exception.

For whatever it's worth, I found the below article on DataStax. https://www.datastax.com/blog/2014/10/cassandra-error-handling-done-right

If you refer to the section on 'WriteTimeOutException' and 'Non-idempotent operations', u can see that the end user is expected to do a retry after receiving the exception. If it's an idempotent operation, then no additional code is required on the application side. Things are not so straight forward for non-idempotent operations. Cassandra assumes that most of the write operations are generally idempotent and I don't necessarily agree with this. The business rules depend on the application.

Example of non-idempotent operations: update table set counter = counter + 1 where key = 'xyz' or update table set commission = commission * 1.02 where key = 'abc'

The article gives some recommendations on how to handle non-idempotent operations using CAS/lightweight transactions in the 'Non-idempotent operations' section. This makes things complicated/ugly in the client code, especially when you have a lot of DML in the code.

Even though it's NOT the answer that I was looking for, it appears that there's no better way at least for now.

Sara
  • 11
  • 1