1

I've been reading through the Couchbase Server documentation, and as I understand it, this is "how it all works":

  • A cluster has one or more nodes (servers).
  • A cluster has one or more buckets.
  • A bucket has one or more views.

My questions:

  1. I would assume that the data in a bucket is distributed over all the nodes in a cluster, correct? Or is it replicated over all nodes?
  2. Assuming that a bucket spans over several nodes in a cluster, does a view retrieve data from all these nodes?
  3. Or is a bucket and its views specific to a certain node?
Community
  • 1
  • 1
Lars Andren
  • 8,601
  • 7
  • 41
  • 56

1 Answers1

5

Your assumption in point one is mostly correct. A bucket is sharded across 1024 vBuckets. These vBuckets are then distributed across the nodes in the cluster (evenly, give or take a remainder), with replica vBuckets being placed on nodes separate to those the master vBuckets are on. By default, a vbucket will only replicate to one other node (and hence each document will replicate to one other node), however you can configure multiple replicas for greater availability if required.

A view (design doc) will index the data for a particular bucket across every node/vBucket, but the index data for that view are stored in the node the vBuckets are on. So when you query a view, it has to go to every node in the cluster. When you rebalance, by default, the index on a node is changed as vBuckets are migrated off. The data is removed from the source node and regenerated on the target node of replication.

A good overview of Couchbase Server's sharding architecture is given in the How-To NoSQL 3.0 Webinar on YouTube.

mrkwse
  • 420
  • 7
  • 16
  • 1
    Thanks a bunch! That was exactly the information I was interested in. I'll make sure to check that webinar out right away :) – Lars Andren May 02 '15 at 17:41