Environment: GKE, 6 nodes, each nodes have 16GB ram (shared \w other pods) and 4 Core (also shared) Mongodb deployment: bitnami helm version 13.5.X - replicaset type (3 worker and 1 arbiter)
I was trying to remove alot of dirty data (100000 docs with each estimated about 2kb in size) on my primary mongodb cluster via port-forwarding
(due to the past experience that everytime i tried to port-forward
{even with direct/primary prefered options} via the kubernetes service, it ended up connecting to the secondary rs).
Albeit (when running the delete query) i was foolish enough to think that i have sufficient resources on running this operation (since it was shared). And now my replicaset is on a crashloop because its experiencing a slow query. From my understanding (cmiiw..) ,when the secondary tries to sync their query log, the don't have enough resources.
Current pods status
NAME READY STATUS RESTARTS AGE
mongodb-0 0/1 CrashLoopBackOff 377 (80s ago) 36h
mongodb-1 0/1 Running 1 (134m ago) 27h
mongodb-2 0/1 ContainerStatusUnknown 59 (6h ago) 11h
mongodb-arbiter-0 1/1 Running 309 (5m40s ago) 2d1h
Log on mongodb-0
{"t":{"$date":"2023-07-21T05:38:30.240+00:00"},"s":"I", "c":"REPL", "id":21550, "ctx":"initandlisten","msg":"Replaying stored operations from startPoint (exclusive) to endPoint (inclusive)","attr":{"startPoint":{"$timestamp":{"t":1689641543,"i":5857}},"endPoint":{"$timestamp":{"t":1689641885,"i":1}}}}
{"t":{"$date":"2023-07-21T05:38:30.377+00:00"},"s":"I", "c":"-", "id":4939300, "ctx":"monitoring-keys-for-HMAC","msg":"Failed to refresh key cache","attr":{"error":"ReadConcernMajorityNotAvailableYet: Read concern majority reads are currently not possible.","nextWakeupMillis":1200}}
{"t":{"$date":"2023-07-21T05:38:30.415+00:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"initandlisten","msg":"Slow query","attr":{"type":"command","ns":"local.oplog.rs","command":{"getMore":151664126847388003,"collection":"oplog.rs","$db":"local"},"originatingCommand":{"find":"oplog.rs","filter":{"ts":{"$gte":{"$timestamp":{"t":1689641543,"i":5857}},"$lte":{"$timestamp":{"t":1689641885,"i":1}}}},"readConcern":{},"$db":"local"},"planSummary":"COLLSCAN","cursorid":151664126847388003,"keysExamined":0,"docsExamined":100510,"numYields":100,"nreturned":100509,"queryHash":"23904D31","planCacheKey":"23904D31","reslen":16777108,"locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":29}},"FeatureCompatibilityVersion":{"acquireCount":{"r":123,"w":18}},"ReplicationStateTransition":{"acquireCount":{"w":36}},"Global":{"acquireCount":{"r":123,"w":13,"W":5}},"Database":{"acquireCount":{"r":15,"w":12,"W":1}},"Collection":{"acquireCount":{"r":19,"w":4,"W":4}},"Mutex":{"acquireCount":{"r":34}},"oplog":{"acquireCount":{"w":1}}},"flowControl":{"acquireCount":10,"timeAcquiringMicros":19},"readConcern":{"provenance":"implicitDefault"},"storage":{"data":{"bytesRead":368803082,"timeReadingMicros":5070664},"timeWaitingMicros":{"schemaLock":628}},"protocol":"op_msg","durationMillis":174}}
{"t":{"$date":"2023-07-21T05:38:31.581+00:00"},"s":"I", "c":"-", "id":4939300, "ctx":"monitoring-keys-for-HMAC","msg":"Failed to refresh key cache","attr":{"error":"ReadConcernMajorityNotAvailableYet: Read concern majority reads are currently not possible.","nextWakeupMillis":1400}}
I have tried to access the cluster via port-forward
to configure the oplog size so that (i hope it would have enough resouce) to sync their query logs. But to avail with no success (since the pods are on an infinite loop of crash...). Though i am not sure, if this is the correct solution (from https://www.mongodb.com/community/forums/t/alert-replication-oplog-window-has-gone-below-1-hours/114043/2) since configuring this means that the mongo cluster needs more resources.
It would be a pleasure to have a suggestion on how to handle such problem. Thanks!