I have multiple CouchDB servers I want to keep in sync with each other, and I use these servers to share large files (e.g. >100 MB). To keep them synchronized, I have each CouchDB instance do a continuous pull replication from each other instance.
Here's an example: I have three CouchDB servers A, B, & C, all of which have continuous pull replications from each other, as so:
------- <------------- -------
| A | -------------> | B |
------- -------
^ | | ^
| | | |
| V | |
------- <---------------- |
| C | -------------------
-------
Someone uploads a document to server A with a 500MB attachment. B and C both start replicating the document from A, and B finishes the replication before C does:
------- doc -------
| A |--------------->| B |
------- -------
|
| doc
V
-------
| C |
-------
My question is, will C then start replicating the same document from B (since C also has a continuous pull replication from B), while it is still transferring the document from A?
------- -------
| A | | B |
------- -------
| doc |
doc| |------------------
| |
V V
-------
| C |
-------
I would guess this would happen, since AFAIK, CouchDB replication doesn't actually store the replicated documents to the target (using the _bulk_docs API) until the documents (including attachments) have been fully fetched from the source[1]. I am worried about this happening since it would be redundant and a big waste of bandwidth.
[1] https://github.com/couchbaselabs/TouchDB-iOS/wiki/Replication-Algorithm