Are the tablets distributed randomly? What I mean is: if a node goes down, does the load of the tablets on the failed node get distributed evenly among the remaining available nodes or does it shift to just one other peer node? The former would be highly desirable. In other words, does YugabyteDB do something like virtual nodes/tokens in Cassandra, but automatically?
1 Answers
Tablets are dispersed across the nodes in load-balanced manner in YugabyteDB.
In particular, when a node goes down, the load is spread evenly across remaining eligible nodes. [See example below for why "eligible" nodes needs to be considered.] The burden of a failed node does NOT go to just one peer node. So yes, you get the benefit of Apache Cassandra like virtual nodes, but automatically. This aspect is true across the different YugabyteDB APIs (YSQL & YCQL).
As an example:
Suppose, you have a 9 node cluster single DC cluster with each node having say 96 tablets. Assuming a replication factor (RF) 3, each node will be a leader for 1/3rd of those -- about 32 tablets and follower for 2/3rd of those -- about 64 tablets.
When a node goes down, the remaining 8 nodes will take up increased responsibilities in a fairly even manner. Initially, the 32 leaders will be distributed among the 8 nodes-- so each node will become a leader for ~4 extra tablets. This failover happens fairly aggressively when followers don't hear from the leaders for a few heartbeats.
If the node stays down for an extended period, then these under-replicated 96 tablets (which are at RF=2) will be brought back to RF=3, again in a uniform manner using the remaining 8 nodes.
In a multi-region setup, not all nodes might be eligible. For instance, in a 15-node, 3-region setup (with 5 nodes in each region) with replication factor (RF) of 3, if a node in a region is down, only the remaining 4 nodes in the region will end up taking the load of the failed node in an even manner. This is because the data placement still needs to respect the constraints of replicating data across regions. We do not want multiple copies of the data to end up in the same region.

- 560
- 2
- 6