4

I have a huge master CouchDB database and slave read-only CouchDB database, that synchronizes with master database.
Because rate of changes is quick, and channel between servers is slow and unstable, I want to set order/priority to define what documents come first. I need to ensure that the documents with highest priority are definitely of the latest version, and I can ignore documents in the end of list.

SORTING, not FILTERING

If it is not possible, what solution could be?

Resource I have already looked at:
http://wiki.apache.org/couchdb/Replication
http://couchapp.org/page/index

UPDATE: the master database is actually Node.js NPM registry, and order is list of Most Depended-upon Packages. I am trying to make proxy, because cloning 50G always fails after a while. But the fact is "we don't need 90% of those modules, but quick & reliable access to those we depend on."

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
Paul Verest
  • 60,022
  • 51
  • 208
  • 332

3 Answers3

2

CouchDB, out of the box, does not provide you with any options to control the order of replication. I'm guessing you could piece something together if you keep documents with different priorities in different databases on the master, though. Then, you could replicate the high-priority master database into the slave database first, replicate lower-priority databases after that, etc.

djc
  • 11,603
  • 5
  • 41
  • 54
2

The short answer is no.

The long answer is that CouchDB provides ACID guarantees at the individual document level only, by design. The replicator will update each document atomically when it replicates (as can anyone, the replicator is just using the public API) but does not guarantee ordering, this is mostly because it uses multiple http connections to improve throughput. You can configure that down to 1 if you like and you'll get better ordering, but it's not a panacea.

After the bigcouch merge, all bets are off, there will be multiple sources and multiple targets with no imposed total order.

Robert Newson
  • 4,631
  • 20
  • 18
0

You could set up filtered replication or named document replication:

Both of these are alternatives to replicating an entire database. You could do the replication in smaller batch sizes, and order the batches to match your priorities.

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
Teddy
  • 18,357
  • 2
  • 30
  • 42
  • I have already seen that, but the point is SORTING, not FILTERING – Paul Verest Mar 08 '13 at 05:06
  • Even if the end result is that all the documents are replicated, just in prioritized order? – mzedeler Mar 08 '13 at 20:59
  • 1
    @PaulV, this solution is to use filtered replication to replicate the highest priority documents first. Then iteratively decrease the priority level of the filter until the entire database is replicated. – David V Mar 09 '13 at 04:06
  • Well, I don't have priority levels. I just have long list, https://npmjs.org/browse/depended and want that most depended upon are replicated first. – Paul Verest Mar 11 '13 at 08:05