Couchdb 3.1.1 increases disk latency gradually over weeks

Question

I've been dealing with this issue for weeks now.

I have the followin scenario:

couchdb2.3.1-A <===> couchdb2.3.1-B <===> couchdb3.1.1-A <===> couchdb3.1.1-B

where <===> represents two pull replications, one configured on each side. i.e: couchdb1 pulls from couchdb2 and viceversa.

Couchdb is running in docker containers

If a write is made at couchdb2.3.1-A, it has to make it through all servers until it comes to couchdb3.1.1-B.

Al of them has an exclusive HDD. Couchdb does not share disk with any other service.

couchdb2.3.1 A and B have no problem.

couchdb3.1.1-A gradually started to increase disk latency over time. So we stopped making write requests to it and started to talk only with couchdb3.1.1-B. couchdb3.1.1-A still receives writes but only by replication protocol. Disk latency did not change.

Changes we've made since problem started:

Upgraded kernel from 4.15.0-55-generic to 5.4.0-88-generic
Upgraded ubuntu from 18.04 to 20.04
Deleted _global_changes database from couchdb3.1.1-A

More info:

Couchdb is using docker local-persist volumes.
Disks are WD Purple for 2.3.1 couchdbs and WD Black for 3.1.1 couchdbs.
We have only one database of 88GiB and 2 views: one of 22GB and a little one of 30MB (highly updated)
docker stats shows that couchdb3.1.1 uses lot of memory compared to 2.3.1:
- 3.5GiB for couchdb3.1.1-A (not receiving direct write requests)
- 8.0GiB for couchdb3.1.1-A (receiving both read and write requests)
- 226MiB for 2.3.1-A
- 552MiB for 2.3.1-B
Database compaction is run at night. Problem only occurs over day, when most of the writes are made.
Most of config is default.

Latency graph from munin monitoring:

disk latency

Any help is appreciated.

Couchdb 3.1.1 increases disk latency gradually over weeks

0 Answers0