0

The Couchbase 2.0 manual describes network partitioning as a potential issue.

http://docs.couchbase.org/couchbase-manual-2.0/couchbase-architecture.html#couchbase-architecture-failover-automatic-considerations

But I didn't see how (if) Couchbase 2.0 deal with such issues on the datastore side.

My question is how is CAS implemented in a cluster and how does CAS operations deal with the split-brain problem? Is there a cluster wide lock? Is it last writer wins?

John Cheng
  • 571
  • 1
  • 4
  • 11

2 Answers2

2

The same question was asked to our Google Groups list: http://groups.google.com/group/couchbase/browse_thread/thread/e0d543d9b17f9c77

It's down at the bottom of the thread, posts starting on Aug 30

Perry

Perry Krug
  • 665
  • 4
  • 6
1

Membase and Couchbase Server 2.0 are partitioning data. For each piece of data (vbucket) there's always single server that is source of truth.

Good side of this is that it's always strictly consistent. There's no need to design for conflict resolution etc.

But when some node goes down, you simply lose access to subset of your data. You can do failover in which case replicas will be promoted to masters for vbuckets that were lost, thus 'recoving' access to this vbuckets. Note that losing some recent mutations is unavoidable in that case due to some replication lag. And failover is manual operation (although recent version has very carefully implemented and limited autofailover).

  • The question is how CAS is handled in a cluster. If there are multiple servers handling the same key (mapped to the same vbucket), does a CAS op compare with all servers in the cluster? – John Cheng Aug 30 '11 at 15:00
  • There's always one active server for given vbucket. So CAS just has to be performed against that single server. – Aliaksei Kandratsenka Aug 30 '11 at 16:14
  • Under network partitioning there could be two servers in the cluster that consider itself to be the primary. – John Cheng Aug 30 '11 at 20:28
  • membase does not allow two servers to have write access to the same data at the same time. If this happens during a network partition, it's because an operator specifically told it to do so knowing that it would introduce consistencies. You'd have to effectively duplicate the entire data set onto both sides. This is generally not the way membase is meant to operate. Think of it more like RAID. If one goes away, another one can replace the broken one until you replace it. – Dustin Aug 31 '11 at 07:13