0

Let's assume we have 3 geographically distributed data centers A,B,C. In each of these, a Cassandra cluster is up and running. Now assume DC A can no longer gossip with B and C.

Writes to A with LOCAL_QUORUM, would still be satisfied - but they would no longer be propagated to B and C; and vice-versa.

This situation could have some very disastrous consequences...

What I'm looking for are some tips on how to rapidly ascertain that DC A has become 'isolated' from the other data centers (using the Native Java driver).

I remember reading about push notifications, but I seem to recall they referred only to the status of the local cluster. Does anybody have any ideas? Thanks.

David Semeria
  • 542
  • 3
  • 15

2 Answers2

0

First thing to note is that in the event that A can no longer connect to B and C, Hints will be stored and delivered upon the restoration of the network connection. So for outages that do not last for a long period of time there is already a safety mechanism and you don't need to do anything.

For longer outages it has been best practice to use the repair command following such an outage to synchronize the replicas.

That said, if you are looking for way to determine when inter DC communication has been disrupted you have several options.

1) Use a tool like Datastax Opscenter to monitor your cluster state, this tool will automatically discover when these sorts of events happen and log them. I also believe you are able to set up triggered events but i'm not an expert in how Opscenter works.

2) Use the Java driver's public Cluster register(Host.StateListener listener) to register a function to be called on node down events, you can then determine when entire DC's go down.

3) Track via JMX on each of the DCs the current state of gossip, this will allow you to see what each Datacenter thinks about the current availability of all the machines. You could do this directly or via nodetool status.

RussS
  • 16,476
  • 1
  • 34
  • 62
  • Thanks Russ. My aim would be for a DC to automatically stop accepting certain writes if it thinks it is isolated. Number 2 looks interesting. – David Semeria May 26 '14 at 07:25
0

@RussS .. I dont think point (2) works when all three host are not reachable ..

For example ..I Implemented state listener and i am poining to my cluster from my local machine .. I can see that listener gets invoked when nodes go up/down .. But i dont see this listener being invoked when I unplug my ether

Dhyan
  • 551
  • 2
  • 6
  • 15