I have NoSQL cluster with RF = 3. This means, having 2 or 3 nodes simultaneously down, some pieces of data (that have all its replicas belonging to failed nodes) will become unavailable.
While the cluster size is about 10 nodes, the chance of 2 nodes being simultaneously down is acceptably low. But with 1000 nodes it's more than possible. The more nodes we have, the more chance of any constant number (equal to RF) of them will be down.
The question is: what is the general approach to have the chance of losing any data constantly low, while the number of instances in the cluster is increasing?
P.S. of course there are no absolutely simultaneous events, with "simultaneous" I mean "in the short period, that data cannot be streamed to a new node".