1

I've been dealing with this issue for weeks now.

I have the followin scenario:

couchdb2.3.1-A <===> couchdb2.3.1-B <===> couchdb3.1.1-A <===> couchdb3.1.1-B

where <===> represents two pull replications, one configured on each side. i.e: couchdb1 pulls from couchdb2 and viceversa.

Couchdb is running in docker containers

If a write is made at couchdb2.3.1-A, it has to make it through all servers until it comes to couchdb3.1.1-B.

Al of them has an exclusive HDD. Couchdb does not share disk with any other service.

couchdb2.3.1 A and B have no problem.

couchdb3.1.1-A gradually started to increase disk latency over time. So we stopped making write requests to it and started to talk only with couchdb3.1.1-B. couchdb3.1.1-A still receives writes but only by replication protocol. Disk latency did not change.

Changes we've made since problem started:

  • Upgraded kernel from 4.15.0-55-generic to 5.4.0-88-generic
  • Upgraded ubuntu from 18.04 to 20.04
  • Deleted _global_changes database from couchdb3.1.1-A

More info:

  • Couchdb is using docker local-persist volumes.
  • Disks are WD Purple for 2.3.1 couchdbs and WD Black for 3.1.1 couchdbs.
  • We have only one database of 88GiB and 2 views: one of 22GB and a little one of 30MB (highly updated)
  • docker stats shows that couchdb3.1.1 uses lot of memory compared to 2.3.1:
    • 3.5GiB for couchdb3.1.1-A (not receiving direct write requests)
    • 8.0GiB for couchdb3.1.1-A (receiving both read and write requests)
    • 226MiB for 2.3.1-A
    • 552MiB for 2.3.1-B
  • Database compaction is run at night. Problem only occurs over day, when most of the writes are made.
  • Most of config is default.

Latency graph from munin monitoring:

disk latency

Any help is appreciated.

0 Answers0