Scenario:
- Client sends a write request to a coordinator node
- Replication factor is 3 and Read/Write Consistency level is QUORUM.
- Coordinator sends the request to nodes A, B and C. Data is committed to node A, but nodes B and C go down immediately after receiving the request from the coordinator. Coordinator will send a timeout exception to the client since it has not received an ack from nodes B and C within the allotted time. The data on node A is now inconsistent with the data on nodes B and C. Based on my understanding nodes B and C will be updated with the value on node A during read repair. So we had a timeout exception here, but the new value has been eventually written to all the nodes.
There could be other timeout exceptions where the new data has not been written to any of the nodes.
So it appears that the developer is expected to handle the timeout exception in the code which may not be straightforward in all cases(because the new value may be written in some cases and not in others and the developer has to check for that during a retry after the timeout).
I'm just learning Cassandra. So if my understanding is not correct, please correct me.
Some of you may say that this happens in a relational DB too, but it's a rare occurrence there since it's not a distributed system.
Here are some articles that I found, but it does not address my question specifically.
What happens if a coordinator node goes down during a write in Apache Cassandra?