0

I have NoSQL cluster with RF = 3. This means, having 2 or 3 nodes simultaneously down, some pieces of data (that have all its replicas belonging to failed nodes) will become unavailable.

While the cluster size is about 10 nodes, the chance of 2 nodes being simultaneously down is acceptably low. But with 1000 nodes it's more than possible. The more nodes we have, the more chance of any constant number (equal to RF) of them will be down.

The question is: what is the general approach to have the chance of losing any data constantly low, while the number of instances in the cluster is increasing?

P.S. of course there are no absolutely simultaneous events, with "simultaneous" I mean "in the short period, that data cannot be streamed to a new node".

1 Answers1

0

Most of the NoSQL systems are developed based on below Concepts :

  • Bloom Filters
  • File Systems
  • Back Up Strategy
  • replication Factor

Since you are worried about losing nodes at the same time, there is always a Data rebuild strategy defined either from Snapshots/File Systems or Bloom Filters etc

The above concept is applicable in Cassandra/MongoDB/elasticSearch etc.

Leader and monitoring tools initiated will be handling corner scenarios. There could be a chance where you might not able to rebuild full Data as what you lost. But, it can able to capture whatever data was there 5 mins before the Nodes goes down [Default AWS snapshot to S3 capture runs every 5 mins]

Rajasrikar
  • 21
  • 2