Parallel/Redundant Replication in CouchDB

Question

I have multiple CouchDB servers I want to keep in sync with each other, and I use these servers to share large files (e.g. >100 MB). To keep them synchronized, I have each CouchDB instance do a continuous pull replication from each other instance.

Here's an example: I have three CouchDB servers A, B, & C, all of which have continuous pull replications from each other, as so:

------- <------------- -------
|  A  | -------------> |  B  |
-------                -------
  ^ |                   | ^
  | |                   | |
  | V                   | |
------- <---------------- |
|  C  | -------------------
-------

Someone uploads a document to server A with a 500MB attachment. B and C both start replicating the document from A, and B finishes the replication before C does:

-------    doc         -------
|  A  |--------------->|  B  |
-------                -------
   |
   | doc
   V
-------
|  C  |
-------

My question is, will C then start replicating the same document from B (since C also has a continuous pull replication from B), while it is still transferring the document from A?

-------                -------
|  A  |                |  B  |
-------                -------
   |          doc         |
doc|    |------------------
   |    |
   V    V
  -------
  |  C  |
  -------

I would guess this would happen, since AFAIK, CouchDB replication doesn't actually store the replicated documents to the target (using the _bulk_docs API) until the documents (including attachments) have been fully fetched from the source[1]. I am worried about this happening since it would be redundant and a big waste of bandwidth.

[1] https://github.com/couchbaselabs/TouchDB-iOS/wiki/Replication-Algorithm

Have you looked at BigCouch yet for replication? Wouldn't have to do it manually. — ryan1234, Jan 05 '13 at 05:15
I've looked at BigCouch, however I am creating mobile ad-hoc networks with devices that come and go. AFAIK, for BigCouch, you have to set up everything statically for clustering and replication. But thanks for the suggestion! — Dan S, Jan 14 '13 at 02:30
I'm working on a project with BigCouch and mobile devices that can connect to a cluster and it works great. The idea is that you put Couchbase Mobile (or TouchDB) on the mobile device and then you have a cluster of BigCouch machines behind a load balancer. Devices connect and reference a database and data is replicated down. But maybe your use case is a little different. — ryan1234, Jan 14 '13 at 02:34

score 1 · Accepted Answer · answered Oct 23 '13 at 15:34

According to a recent discussion on the CouchDB users@ list and to this document describing the replication algorithm the replication knows which attachment is already present on the target. If, however, the attachments are very large and both ends start replicating before either of them has finished, the attachment will be transferred multiple times.

Parallel/Redundant Replication in CouchDB

1 Answers1