CouchDB: difference between max_replication_retry_count and retries_per_request

Question

I'm currently exploring CouchDB replication and trying to figure out the difference between max_replication_retry_count and retries_per_request configuration options in [replicator] section of configuration file.

Basically I want to configure continuous replication of local couchdb to the remote instance that would never stop replication attempts, considering potentially continuous periods of being offline(days or even weeks). So, I'd like to have infinite replication attempts with maximum retry interval of 5 minutes or so. Can I do this? Do I need to change default configuration to achieve this?

score 2 · Answer 1 · answered Jun 20 '17 at 16:26

Here's the replies I've got at CouchDB mailing lists:

If we are talking Couch 1.6, the attribute retries_per_request controls a number of attempts a current replication is going to do to read _changes feed before giving up. The attribute max_replication_retry_count controls a number of attempts the whole replication job is going to be retried by a replication manager. Setting this attribute to “infinity” should make the replicaton manager to never give up.

I don’t think the interval between those attempts is configurable. As far as I understand it’s going to start from 2.5 sec between the retries and then double until reached 10 minutes, which is going to be hard upper limit.

Extended answer:

The answer is slightly different depending if you're using 1.x/2.0 releases or current master.

If you're using 1.x or 2.0 release: Set "max_replication_retry_count = infinity" so it will always retry failed replications. That setting controls how the whole replication job restarts if there is any error. Then "retries_per_request" can be used to handle errors for individual replicator HTTP requests. Basically the case where a quick immediate retry succeeds. The default value for "retries_per_request" is 10. After the first failure, there is a 0.25 second wait. Then on next failure it doubles to 0.5 and so on. Max wait interval is 5 minutes. But If you expect to be offline routinely, maybe it's not worth retrying individual requests for too long so reduce the "retries_per_request" to 6 or 7. So individual requests would retry a few times for about 10 - 20 seconds then the whole replication job will crash and retry.

If you're using current master, which has the new scheduling replicator: No need to set "max_replication_retry_count", that setting is gone and all replication jobs will always retry for as long as replication document exists. But "retries_per_request" works the same as above. Replication scheduler also does exponential backoffs when replication jobs fail consecutively. First backoff is 30 seconds. Then it doubles to 1 minute, 2 minutes, and so on. Max backoff wait is about 8 hours. But if you don't want to wait 4 hours on average for the replication to restart when network connectivity is restored, and want to it be about 5 minutes or so, set "max_history = 8" in the "replicator" config section. max_history controls how much history of past events are retained for each replication job. If there is less history of consecutive crashes, that backoff wait interval will also be shorter.

So to summarize, for 1.x/2.0 releases:

[replicator] max_replication_retry_count = infinity retries_per_request = 6

For current master:

[replicator] max_history = 8 retries_per_request = 6

CouchDB: difference between max_replication_retry_count and retries_per_request

1 Answers1