couchdb replication on a lot of servers

Question

I am currently looking at CouchDB and I understand that I have to specify all the replications by hand. If I want to use it on 100 nodes how would I do the replication?

Doing 99 "replicate to" and 99 "replicate from" on each node
- It feels like it would be overkill since a node replication includes all the other nodes replications to it
Doing 1 replicate to the next one to form a circle (like A -> B -> C -> A)
- Would work until one crash, then all wait until it comes back
- The latency would be big for replicating from the first to the last

Isn't there a way to say: "here are 3 IPs on the full network. Connect to them and share with everyone as you see fit like an independent P2P" ?

Thanks for your insight

Maybe [BigCouch](https://github.com/cloudant/bigcouch) is what you should use instead? It basically takes big clusters of nodes and allows them to appear as a single instance of CouchDB to end-users/applications. — Dominic Barnes, Nov 30 '12 at 14:59
I agree with Dominic. Have a look at Cloudant and save yourself the trouble. What you are probably after is sharding which is what BigCouch (and Cloudant) does for you. — AndyD, Dec 11 '12 at 16:50

score 1 · Answer 1 · answered Apr 12 '13 at 13:18

BigCouch won't provide the cross data-center stuff out of the box. Cloudant DBaaS (based on BigCouch) does have this setup already across several data-centers.

BigCouch is a sharded "Dynamo-style" fork of Apache CouchDB--it is to be merged into the "mainline" Apache CouchDB in the future, fwiw. The shards live across nodes (servers) in the same data-center. "Classic" CouchDB-style Replication is used (afaik) to keep the BigCouches in the various data-centers insync.

CouchDB-style replication (n-master) is change-based, so replication only includes the latest changes.

You would need to setup to/from pairs of replication for each node/database combination. However, if all of your servers are intended to be identical, replication won't actually happen that often--it will only happen if needed.

If A gets a change, replication ships it to B and C (etc). However, if B--having just got that change--replicates it to C before A gets the chance too--due to network latency, etc--when A does finally try, it will realize the data is already there, and not bother sending the change again.

If this is a standard part of your setup (i.e., every time you make a db you want it replicated everywhere else), then I'd highly recommend automating the setup.

Also, checkout the _replicator database. It's much easier to manage what's going on: https://gist.github.com/fdmanana/832610

Hope something in there is useful. :)

couchdb replication on a lot of servers

1 Answers1