2

I think I might be confusing concepts here, but it seems to me like paxos would provide linearizable consistency for systems that implement it.

I know Cassandra uses it. I'm not 100% clear on how but assuming a leader is elected and that single leader does all the writes then communication is synchronous and real-time linearizability is achieved right?

But consensus algorithms like paxos are generally considered partially synchronous because there is a quorum (not 100% of node communication)- does this also mean it's not truly linearizable as well?

maybe because there is only a quorum a node could fall out of sync and that would break linearization?

red888
  • 27,709
  • 55
  • 204
  • 392

1 Answers1

3

A linearizable system does not need to be synchronous. Linearizability is a safety property: it says "nothing bad happens" but it doesn't affect linearizability if nothing good happens either. Any reads or writes that do not return (or that return an error) can be ignored when checking for linearizability. This means it's perfectly possible for a system to be linearizable even if one or more of the nodes are faulty or partitioned or running slowly.

Paxos is commonly used to implement a replicated state machine: a system that executes a sequence of operations on multiple nodes at once. Since the operations are deterministic and the nodes all agree on the operations to run and the sequence in which to run them, the nodes all converge to the same state (eventually).

You can implement a linearizable system using Paxos by having the operations in the sequence be writes and reads using the fact that the operations are placed in a totally-ordered sequence (i.e. linearized) by the Paxos protocol.

It's important to put the reads in the sequence as well as the writes. Imagine instead you only used Paxos to agree on the writes, and served reads directly from a node's local state. If the node serving the reads is partitioned from the other nodes then it would serve stale reads, violating linearizability. Each read must involve a quorum of nodes to ensure that the returned value is fresh, which means (effectively) putting the read into the sequence alongside the writes.

(There's some tricks you can play to make reads a bit more efficient than writes, given that reads commute with each other and don't need to be persisted to disk, but you can't escape the need to contact a quorum of nodes for both read and write operations)

Dave Turner
  • 1,846
  • 16
  • 27
  • "A linearizable system does not need to be synchronous" i think maybe that is what I was hung up on. Could you explain this "... having the operations in the sequence be writes _and_ _reads_"? If reads are not included in the sequence then it would be possible for a client to read stale data before a sequence of writes has completed right? So you would have reads dynamically injected into the sequence of writes as read requests come in from clients so a client read is ordered correctly to ensure its getting the most current data? Am I understanding this? – red888 Dec 18 '19 at 20:19
  • 1
    I think you've got it, yes, but I added more detail to my answer on this subject. – Dave Turner Dec 18 '19 at 21:13
  • oh one last question is there an actual implementation of this that would be a good reference? preferably one using simple-er paxos and not a more complex implementation that handles byzantine failures. just to see how this is fully implemented in a production system/product – red888 Dec 18 '19 at 21:18
  • 1
    I don't know of a good and simple production-ready example of this. Production concerns add a lot of complexity. Maybe look at LogCabin, the reference Raft implementation, since I think that supports linearizable reads and contains a fair amount of productionization. (Raft is itself approximately Paxos plus some productionisation) – Dave Turner Dec 18 '19 at 21:52
  • For optimisations reasons read commands also can be executed by the leader. *"Read commands need not be replicated across all nodes. It’s sufficient to execute them only on the leader."* (Howard, Heidi. 2014. “ARC: Analysis of Raft Consensus.”) – Andrei May 11 '22 at 07:32
  • The full quote imposes some important side conditions: "It’s sufficient to execute them only on the leader, assuming that the leader has committed an entry from its term and recently dispatched a successful AppendEntries to a majority of nodes". In particular I'm not sure you can even really define "recently dispatched" in a way that's guaranteed to be safe, at least not without making some extra assumptions about clock behaviour that Paxos avoids. – Dave Turner May 12 '22 at 10:28