I am running MongoDB 4.x on Linux in a clustered setup (3 replicas), and sometimes memory inexplicably spikes (>10% sudden memory consumption increase on a 64GB RAM machine by the mongod process) and doesn't go back down for sometimes hours. Sometimes this happens multiple times over a short period, resulting in swap being consumed and ultimately slowing down the whole database, affecting replication lag and causing general cluster instability.
The workload of the DB is fairly high - average CPU load of 50-80% on an 8-core machine, and average memory consumption of 70% of 64GB. Workload is a mix of high speed writes and batched reads. I try to direct all heavy reads to the secondaries so that the primary can focus on writes, but sometimes large reads hit the primary too.
During the spike, performing a db.currentOp()
doesn't reveal anything taking long, though some queries that should not take long (simple find()
on a tiny collection) can take seconds around the time of these spikes.
What can I do to see what queries are consuming all this memory suddenly? I have been looking for slow queries, but I feel like this is an (innacurate) proxy to find what is consuming so much memory.