1

Given we have a 2x5 nodes setup (across 2 DC) and replication factor of 3, and the fact that we create views asynchronously (so we can safely retry failed operations) does using WRITE=ALL and READ=ONE make sense?

If one replica fails, how can we know the recovery time so how to pick up right retry interval and timeout?

kboom
  • 2,279
  • 3
  • 28
  • 43

2 Answers2

0

Any of the below combination should give you correct data:

  1. WRITE=ALL READ=ONE
  2. WRITE=ONE READ=ALL
  3. WRITE=LOCAL_QUORUM READ=LOCAL_QUORUM

You can tune consistency level in your application, as per load of the application.

According to me, Number 3 LOCAL_QUORUM should work better, As sometimes a node can be under high load or maybe is down. Your application will not get affected.

In case, you have more writes than READ; WRITE CL=ALL will make your application slow.

Anil Kapoor
  • 644
  • 6
  • 19
0

The combination of WRITE=ALL and READ=ONE is correct in the sense of consistency - after you've written to all the replicas, you can indeed read from any one and expect the latest data. However, it is bad for high availability - if any one of the 6 replicas in both DCs is down, a write cannot complete. If one of the nodes is down for an hour, you cannot do any write for an hour. In some batch-processing setups this may make sense, but it usually not acceptable behavior for interactive workloads, where high-availability is a primary concern.

If you really don't care about high availability and just want to write when all the nodes are up, then I guess WRITE=ALL could work. You can tell when all the nodes are up using "nodetool", for example. Or just retry the writes periodically.

Nadav Har'El
  • 11,785
  • 1
  • 24
  • 45