I have a distributed application across which I'd like to replicate a single, eventually consistent state. The data is suitable for a CRDT (http://pagesperso-systeme.lip6.fr/Marc.Shapiro/papers/RR-6956.pdf) which has the excellent property that each node, given the same set of messages, will deterministically converge to the same value without complicated consensus protocols.
However, I need another messaging/log layer that will ensure that each node actually sees every message, even in the face of adverse network conditions.
Specifically, I'm looking for an algorithm that has the following properties:
- Works on an asynchronous network.
- Nodes are only necessarily aware of their neighbors, not the whole network.
- Nodes may be added or dropped at any time (that is, the network is not of a fixed size or topology).
- The network can be acyclic (this can be a requirement, if necessary).
- Is capable of bringing up to date a node that has become behind due to temporary network outage or dropped messages.
- Is capable of bringing a new, empty node joining the cluster up to date.
- There is not a hard limit on the time taken for the network to converge on a value (that is, for every node to recieve every message), but given no partitions it should be fairly quick (in fuzzy terms, a matter of seconds, not minutes).
- Is bounded in size. Algorithms that keep the entire message history (which will grow boundlessly) are unsuitable.
Is anyone aware of an algorithm with these properties?