1

By design we can write to any node in the Cassandra replica set. For situation my replica set has 2 node. When I make a write operation to node A but the node is unavailable. Do I have catch exception then re-write to node B manually ?

On mongodb, their Driver have "Retry-able Writes" to auto write to another node if primary node is down. Does Cassandra have this feature ?

Andrew
  • 26,629
  • 5
  • 63
  • 86
user2877989
  • 587
  • 1
  • 6
  • 19

1 Answers1

1

When you write to Cassandra you specify the consistency level you wish to write wish - ranging from ANY which provides no guarantees, up to ALL which requests that all replicas in all DCs acknowledge back to the co-ordinator.

This write is sent to a single node - based on your load balancing policy - that node acts as the co-ordinator for the whole operation, and will return a single response of success / exception- your application does not have to itself individually send the write to multiple nodes, it just sends to 1 node (any node can be used) who co-ordinates the write to the replicas.

In a normal scenario of using local_quorum for a write with a very normal replication factor of 3 then as long as the co-ordinator has 2 of the 3 nodes providing acknowledgement of the write, the application will not get any exception - even if the 3rd node fails to write the data.

There is a retry policy available on the driver - which can allow for a retry in the event of a timeout, you should ensure though that the operation is idempotent when using this. (for example, appending an item to a list, retrying could result in the item being on the list twice on one of the replicas).

With your particular replication factor being 2 - you are currently in a position where you are lack consistency guarantees, or resilience.

  • one / local_one - only guarantees one of the nodes got the write. (Both are likely to get it but there is no guarantee provided)
  • quorum / local_quorum - requires both nodes acknowledge, so you have no ability to handle a node failure.

This is because the quorum of 2 is 2 - if you used 3 nodes with an RF=3, then local_quorum requires 2 of the 3, which would allow a node to be down while providing a stronger guarantee on consistency.

Andrew
  • 26,629
  • 5
  • 63
  • 86
  • Thanks for you answering. I have two more questions 1. In my application source-code, instead manually setting which node IP address to be received the write request. We can use Load-balancing to coordinate nodes to handle the write request and set the load-balancing IP address in driver configuration? – user2877989 Jun 02 '23 at 10:08
  • 2. By setting local_quorum you say “ then as long as the coordinator has 2 of the 3 nodes providing acknowledgement of the write,”. If selected node IP address two be written is 192.168.0.1 then what does ip address of coordinator ? if 192.168.0.1 is down, does it auto coordinate to other node IP address, I mean all above task just one request from my application. – user2877989 Jun 02 '23 at 10:08
  • 1
    The IP you provide is the contact points when making the initial connection, the driver will gossip and discover the rest of the cluster - the IP you provide is not where the queries all are sent to, it is just the initial cluster discovery contact point. The load balancing policy you use will determine the location queries are sent, ideally a TokenAware policy which means it will choose a co-ordinator which is also one of the replicas for the data involved, or a round robin policy which will do exactly that. – Andrew Jun 02 '23 at 11:07
  • 1
    You would commonly use multiple contact points configured in the application, to ensure that a node being down does not prevent the application from connecting. Node outages are gossip'ed amongst the nodes and the driver will also be made aware of them so that it makes alternative choices. – Andrew Jun 02 '23 at 11:09