4

Lets say I have dynamic network of X servers (not fixed over time) with a identical copy of all events in a append only database. Now I'd like to support creating new events on any of these 10 servers and have them reach consensus, replicate the events and all result in the exact same event order. I understand this is a common problem and that there are algorithms that are supposed to handle things like this. But I don't fully grasp them and I have a few questions regarding consensus in relation to event sourcing in particular.

I assume that a server never can be completely sure that the value it think it has reached consensus about is really what ends up being the "correct" value? I base this on the fact that new servers can join to the network at any time and tip the balance to favor another value. This could potentially happen much later as well. But in this case, how should a server handle a new "correct" value? In event sourcing it is normal to append compensating events to make corrections but shouldn't these compensating event have to be replicated to all servers as well then? To make sure that all servers have the exact same events I mean.

If not adding compensating event but instead just "popping" already committed events we wouldn't have to replicate these I guess but then we would run into other problems instead. If the (wrongly) committed events are sent out on a event bus so that other services can react to them we can't just pop them from our event db without messing things up.

Or is it better to just really commit to a value once consensus have been reached within a small timeframe? And then treat all new/late servers with a cold hand? Make them accept the result anyway? What if the new server is connected to it's own network that is larger then the first one and they all have reached consensus on another value?

Andreas Zita
  • 7,232
  • 6
  • 54
  • 115
  • I would look at particular aggregate streams. In event sourced system you need consistency of event stream for any given aggregate. Which mean all commands for given aggregate should go to the same "master" replica of event stream. You don't really need much consistency between different aggregates. So among 10 of your servers for any given aggregate one should contain a master event stream. – Roman Eremin Nov 20 '17 at 14:26
  • The things is I have the same aggregate and event stream on all servers. I may seem odd but I'm trying to figure out how to have multiple write sides (no master) and handle consensus on the order of events (and which commands to accept or refuse). – Andreas Zita Nov 20 '17 at 15:15
  • What are your requirements with respect to availability, partition tolerance and consistency? It sounds like you want an AP system with eventual consistency (like CouchDB, say), but you need to be crystal clear on your requirements, otherwise you won't know which algorithms can be used (or whether an algorithm even theoretically exists). If you want a CP system, then Raft is probably your best bet (note with Raft, all writes must go through the elected leader). – TomW Nov 24 '17 at 12:29

1 Answers1

3

I don't know what your background is with distributed systems, but there are established consensus algorithms out there (e.g. Raft, Paxos, Viewstamped Replication, etc.) that handle adding and removing servers from a cluster without impacting committed events.

Generally, you don't apply events until they have been committed by a majority, especially since (as you mentioned) some applied events will have external visibility (e.g. you release $500 from an ATM machine). Thus, in order for a server to apply an event, it must know that the event has been committed. Furthermore, when servers are added to a system, they must be brought up-to-date with the events that have already been committed, and they may not choose a different value. If they could choose different values, the system would no longer provide safety. I would recommend reading the Raft paper. Raft is a consensus algorithm that may work for you, and the algorithm isn't difficult to understand. Raft specifically handles adding and removing servers (see section 6). There is also an implementation of Raft available that you can use.

There are other algorithms that support weak consistency and undo operations, such as Bayou. The algorithm you choose ultimately depends on the needs of your application.

Jack Humphries
  • 13,056
  • 14
  • 84
  • 125