0

Background: I come from many years of Oracle experience. About 3 years ago, started down the distributed path with Cassandra/DSE. I have a very good grasp on Cassandra. Over the past month, I have heard cockroachdb mentioned several times. So now, as I'm familiar with cassandra, I get thrown this curve ball to look at. cockroachdb sounds a lot like cassandra in how it writes - with the client CL of QUORUM (I don't believe cockroachdb uses immutable files, however, but more like a RDBMS with physical rows (kv pairs)). That being said, I also understand very well how Cassandra reads data - but there isn't any real good documentation/videos/discussions on the reading mechanics of cockroachdb.

Let's assume this scenario:

3 nodes - a, b and c
RF=3
leader (node 'a') gets a write request
Writes to 2 nodes ('a' and 'b' - node 'c' is down)
leader acknowledges write
leader goes down (node 'a' is down) while node 'c' comes back up
leader becomes, say, node 'c'
read comes in for previously written data, above

As C didn't get the change, what is displayed to the client? Does it do quorum as well? If so, does it "fix" the data during the read?, etc. At some point, something "fixes" the data. In cassandra changes are stored for 3 hours before dropped (then repair has to be run). What about cockroachdb? How are "lost changes" sent to nodes that were unavailable.

I don't believe these levels of discussions are documented very well, or at least to me it isn't.

-Jim

Jim Wartnick
  • 1,974
  • 1
  • 9
  • 19

1 Answers1

1

You have confused your problem statement by not clearly defining when things happen -- it is not clear whether A dies before or after C has become a new leader.

The reason why this matters is that when node C comes back up, it won't be able to participate in leader elections unless it "catches ups" with the raft log, the history of committed writes so far. Until C has all the data that A and B had, it won't become the new leader.

If node A dies before C has caught up, there won't be any leader any more and the range will become unavailable (read/writes will stall).

Does this clarify?

kena
  • 109
  • 4
  • Yes, that helps. Can you explain the raft log a bit more? Does that only exist on the leader nodes, or is that on each node? If A doesn't come back up, but C does, can B send the changes to C? Also, how long is the raft log kept for? Infinite amount of time? Thanks for the clarification. – Jim Wartnick Dec 17 '18 at 18:13
  • @JimWartnick The Raft log is present on all nodes (it's what tracks/serializes state mutations). If C comes back up, B can propagate its changes to C (what kena mentions as "catching up.") The Raft log is kept as long as the node is alive, i.e. in perpetuity from the node's perspective. – Loiselle Dec 17 '18 at 21:10
  • Thanks everyone for the clarifications. Can I dig a bit more on this raft log? Something most maintain it so it doesn't grow forever. i.e. at some point some of the transaction information must be purged from the raft logs to maintain size (at least I would think would be the case) – Jim Wartnick Dec 18 '18 at 13:40
  • You are correct. There are periodically events called "epochs" that nodes agree on by consensus where agreement is that everyone has caught up to that point. When that occurs everything before the epoch is purged away. If a node was temporarily dead during that vote/decision and only comes back later, it will see a new epoch and simply be excluded from consensus (until it "resets" and brings in a fresh copy of the raft log from another node). I hope this clarifies. – kena Dec 18 '18 at 22:06