37

When is it a good idea to use something like CRDT instead of paxos or raft?

Eric des Courtis
  • 5,135
  • 6
  • 24
  • 37

7 Answers7

38

If you can use something like CRDT, do so. You should get much better performance. It also enables interesting use cases such as working offline and then merging later. However it is not always possible to design things such that a CRDT will work for you. In that case, paxos can solve the hard problems for you.

But even if you've decided to use paxos, generally you should limit how much work is being done directly through the paxos algorithm. Instead for performance reasons you want to reserve paxos for necessary operations such as master election, and then let a replicated master setup handle most decisions. (In a high throughput environment the master is likely to do something like delegate responsibility for specific shards to specific children, which replicate off each other. Do not let the master become a bottleneck...)

That said, it is much easier to claim that you'll wave the magic wand of paxos than it is to actually do it in practice. In that light you may find http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/chubby-osdi06.pdf to be an interesting description of the difficulties that a real-world paxos implementation is likely to encounter.

btilly
  • 43,296
  • 3
  • 59
  • 88
  • 7
    For anyone today who reads this, if you're thinking of implementing paxos then you probably should instead use Zookeeper. And if you're trying to deal with distributed consistency, you should read https://aphyr.com/tags/jepsen then think long and hard about not trying to reinvent the wheel. – btilly Sep 28 '15 at 23:31
16

I think this guy know what he is talking about:

Blog

Video

Conclusion about distributed systems

Antiarchitect
  • 1,852
  • 2
  • 16
  • 19
10

CRDTs and Paxos have different goals and are used under different scenarios. They have in common that they help programmers deal with concurrency/replication. CRDTs are data types that asume concurrent updates will occur. Paxos is a protocol that enforces they wont, by enforcing a total order on them. Let's see this in more detail.

Lets say we have a replicated set which is replicated at two different places.

Using Paxos guarantees that writes to the set will be executed by every replica in the same order. More generally, it guarantees that all replicas AGREE on how the state of the set evolves.

If you have, for example, user1 performing update1 at replica1, adding element 1 to the replicated set while simultaneously user2 performs update2, adding element2 at replica2, Paxos will make replicas agree on a given order for those updates, or possibly agree on choosing one of the two updates and discarding the second one, depending on how you use it and what you want to achieve. If Paxos outcome is, say, that update1 comes before update2, every replica will update the set in that order. As a consequence, users reading the set concurrently with those updates can observe, regardless of where (at which replica) they read, ONLY the following states of the set (assuming the set was empty at the beggining):

{} (empty set)

{element1}

{element1, element2}

Furthermore, these states can be seen ONLY in that order, meaning that once the state of the set is {element1, element2} (at every replica), no subsequent read will return {} or {element1}.

Positive side: This set is simple to reason about, as it is equivalent to a set that is not replicated.

Negative side: Unavailability: If replicas can't talk to each other (network partition), your set can't be updated, as there can be no agreement. Low performance, high-latency: Agreement require that replicas synchronize before replying to the client. This incurs latency proportional to the latency between replicas.

CRDTs have weaker guarantees. A CRDT set is not equivalent to a sequential, single-copy one. It asumes that there is no agreement or total order on how replicas are updated.

CRDTs guarantee that if both replicas of the set have seen the same updates (regardless of the order in which they see them), then they will exhibit the same state; replicas will converge.

In our example of two users performing updates concurrently, a system that does not run Paxos to order operations on the set (this happens, e.g., under eventual or causal consistency), will allow replica1 to add element1 while replica2 is adding element2

so, the state at replica1 will be: {element1}

and the state at replica2 will be: {element2}

At this point in time, replicas diverge. Later, when replicas synchronise, they will exchange their updates, finally exhibiting this state:

state at replica1 will be: {element1, element2}

state at replica2 will be: {element2, element1}

At this point in time, replicas have converged.

Users reading the set concurrently with those updates can observe, depending of where (at which replica) they read, the following states of the set (assuming the set was empty at the beggining):

{} (empty set)

{element1} (if they read from replica1)

{element2} (if they read from replica2)

{element1, element2}

{element2, element1}

Negative side: This set is hard to reason about, as it shows states that could not occur in a sequential set. In our example, we have observed only the case of two concurrent adds to a set, which is straightforward. Concurrent adds and remove are more complex There are many datatypes with different issues:

A comprehensive study of Convergent and Commutative Replicated Data Types

Positive side: High-availability: If replicas can't talk to each other (network partition), your set CAN be updated. Replicas will sync when they connect back. High performance, low-latency: Replicas immediately reply to clients and synchronize in the background, after replying to the client.

alek
  • 121
  • 2
  • 5
4

There is a flaw with the CRDT Treedoc example. Each node requires a disambiguator for the case when two systems insert at the same time with the same key.

After this happens it is not longer possible for systems to insert between the entries that have identical keys but different disambiguators, as that requires the system to insert another identical key but controlling the disambiguator ordering. The disambiguators are not dense so this is not always possible. If the disambiguators were yet another tree, you solve one problem but then need another conflict resolution mechanism a depth further down ... etc.

This unmentioned problem, plus the fact you need to do a two phase commit to tidy up the meta-data makes me think CRDTs are still a work in progress.

Tom Larkworthy
  • 2,104
  • 1
  • 20
  • 29
  • 1
    What do you think about the Riak implementation of CRDTs? – Eric des Courtis May 13 '14 at 18:15
  • 3
    The different flavour of sets seems to not have the same problem as the treedoc, they are on firmer ground. I was a bit too general in my negativity, but the treedoc was my first exposure to CRDTs. Note that the different sets all have drawbacks and qwerks though, none are actually a set in a mathmatical sense, that's why there are so many different types. So CRDT are not a total solution, you have to work out which qwerks are ok for your application, but there was never going to be a total solution given CAP. – Tom Larkworthy May 13 '14 at 18:46
3

When Eventual Consistency is an option for your application / data model and converging state is a good fit. If you need Linearizability over your data, you are building a replicated state machine, you need a leader election, cluster membership view change, or to reach a consensus over something with guaranteed liveness than you need PAXOS or Raft. Check out your requirements.

Paxos and Raft are equivalent protocols. It's a myth that Raft is simpler. However, by perception lots think so. Don't write your own. Use ZooKeeper if Zab fits, one of the existing implementations, etcd or anything else.

Common usage:

CRDT used for data sync (i.e. between mobile devices and/or servers), collaboration editing, values sync in dist-db implementations and all other cases where eventual consistency is fine.

PAXOS and its variations mostly used in proprietary systems, infrastructure systems (i.e. Chubby), distributed systems and databases like BigTable, Datastore, Spinnaker, Spanner, Cassandra, Scylla and etc.

RAFT today became popular and is present in many OSS infrastructure projects like ETCD, Consul, ... and databases: CockroachDB, TiDB, Scylla also use Raft. There is also a BFT, but its used less.

2

There are multiple metrics we have:

  • throughput (CRDT and Paxos are the same because all requests are replicated on all replicas in the end no matter CRDT or Paxos);
  • latency (CRDT is better than Paxos because it writes to smaller number of replicas);
  • reliability (CRDT is weaker than Paxos because it writes to smaller number of replicas (smaller than majority) which may result state lost);
  • consistency (CRDT is weaker than Paxos because it allows concurrent writes without synchronization point (basically no overlapping replicas), while Paxos writes always requires an overlapping replica to do the serialization).

My suggestion is that we should use Paxos when the replicas are not far from each other (e.g., within a data center), and use CRDT when network partitioning is a normal (e.g., disconnected mobile).

imzhenyu
  • 29
  • 2
-2

Whenever it is appropriate. However, PaxOS is not that bad as its throughput is typically the same as with CRDT, not to mention that the reliability is much higher (CRDT may result state lost), and, its latency is not that bad neither as it only requires a majority of the replicas replies instead of all.

imzhenyu
  • 29
  • 2
  • 2
    Could you please back up the statement "PaxOS is not that bad as its throughput is typically the same as with CRDT" and "CRDT may result state lost" when compared to Paxos. – Eric des Courtis Mar 05 '14 at 16:08
  • All write requests need to be propagated to all replicas in the end, which means that the total processing amount is the same, no matter PaxOS or CRDT. CRDT results state lost when a replica works offline and it loses its state before state merge happen. – imzhenyu May 09 '14 at 06:26
  • While it is true that all write requests need to be propagated to all replicas. I don't think PaxOS actually results in any performance gains in comparison with CRDTs. You gain performance from the fact that you can write to many nodes simultaneously. Not to mention that with PaxOS you typically need to do replication anyway. Also PaxOS also has this property that it forces you do have a bottleneck (master node). – Eric des Courtis May 13 '14 at 18:09
  • I agree CRDT has better latency performance with some cost. See my new answer above (too long to post as a comment). – imzhenyu Jun 19 '14 at 09:35