I've been dealing with this issue for weeks now.
I have the followin scenario:
couchdb2.3.1-A <===> couchdb2.3.1-B <===> couchdb3.1.1-A <===> couchdb3.1.1-B
where <===>
represents two pull replications, one configured on each side. i.e: couchdb1 pulls from couchdb2 and viceversa.
Couchdb is running in docker containers
If a write is made at couchdb2.3.1-A
, it has to make it through all servers until it comes to couchdb3.1.1-B
.
Al of them has an exclusive HDD. Couchdb does not share disk with any other service.
couchdb2.3.1
A
and B
have no problem.
couchdb3.1.1-A
gradually started to increase disk latency over time. So we stopped making write requests to it and started to talk only with couchdb3.1.1-B
. couchdb3.1.1-A
still receives writes but only by replication protocol. Disk latency did not change.
Changes we've made since problem started:
- Upgraded kernel from
4.15.0-55-generic
to5.4.0-88-generic
- Upgraded ubuntu from
18.04
to20.04
- Deleted
_global_changes
database fromcouchdb3.1.1-A
More info:
- Couchdb is using docker local-persist volumes.
- Disks are WD Purple for
2.3.1
couchdbs and WD Black for3.1.1
couchdbs. - We have only one database of
88GiB
and 2 views: one of22GB
and a little one of30MB
(highly updated) docker stats
shows that couchdb3.1.1 uses lot of memory compared to 2.3.1:3.5GiB
for couchdb3.1.1-A (not receiving direct write requests)8.0GiB
for couchdb3.1.1-A (receiving both read and write requests)226MiB
for 2.3.1-A552MiB
for 2.3.1-B
- Database compaction is run at night. Problem only occurs over day, when most of the writes are made.
- Most of config is default.
Latency graph from munin monitoring:
Any help is appreciated.