I'm studying up for system design interviews and have run into this pattern in several different problems. Imagine I have a large volume of work that needs to be repeatedly processed at some cadence. For example, I have a large number of alert configurations that need to be checked every 5 min to see if the alert threshold has been breached.
The general approach is to split the work across a cluster of servers for scalability and fault tolerance. Each server would work as follows:
start up
read assigned shard
while true:
process the assigned shard
sleep 5 min
Based on this answer (Zookeeper for assigning shard indexes), I came up with the following approach using ZooKeeper:
- When a server starts up, it adds itself as a child under the node
/service/{server-id}
and watches the children of the node. ZooKeeper assigns a unique sequence number to the server. - Server reads its unique sequence number
i
from ZooKeeper. It also reads the total number of childrenn
under the/service
node. - Server identifies its shard by dividing the total volume of work into
n
pieces and locating thei
th piece. - While true:
- If the watch triggers (because servers have been added to or removed from the cluster), server recalculates its shard.
- Server processes its shard.
- Sleep 5 min.
Does this sound reasonable? Is this generally the way that it is done in real world systems? A few questions:
- In step #2, when the server reads the number of children, does it need to wait a period of time to let things settle down? What if every server is joining at the same time?
- I'm not sure how timely the watch would be. Seems like there would be a time period where the server is still processing its shard and reassignment of shards might cause another server to pick up a shard that overlaps with what this server is processing, causing duplicate processing (which may or may not be ok). Is there any way to solve this?
Thanks!