we faced issue regard replication of MONGO replicSet on production so I rebuild the replica set by deleting the local DB and initate it again with just primary node so production application can be up until we solve it. i deleted the local db from mongo primary node and arbiter , and get the replica set with just one primary node.
but i faced another issue because replicaset still trying to reach the arbiter node eventhough it is not shown in rs.status command , this cause too many open files issue and then DB become down.
i increased the limit for open files , and stopped the arbiter node but still i have logs showing it try to connect to arbiter . how i can solve this until we can rebuild the replication with secondary and arbiter nodes .
logs:
2023-07-10T02:04:55.287+0300 I CONNPOOL [Replication] Dropping all pooled connections to ArbiterID:27017 due to HostUnreac
hable: Error connecting to 10.202.2.48:27017 :: caused by :: Too many open files
2023-07-10T02:04:55.287+0300 E - [Replication] cannot open /dev/urandom Too many open files
2023-07-10T02:04:55.287+0300 F - [Replication] Fatal Assertion 28839 at src/mongo/platform/random.cpp 161
2023-07-10T13:01:09.802+0300 I REPL_HB [replexec-222] Error in heartbeat (requestId: 16678428) to ArbiterID:27017, response status: HostUnreachable: Error connecting to ArbiterID:27017 :: caused by :: Connection refused
2023-07-10T13:01:09.802+0300 I ASIO [Replication] Connecting to ArbiterID:27017
2023-07-10T13:01:09.802+0300 I ASIO [Replication] Failed to connect to ArbiterID:27017 - HostUnreachable: Error connecting to ArbiterID:27017 :: caused by :: Connection refused
2023-07-10T13:01:09.802+0300 I CONNPOOL [Replication] Dropping all pooled connections to ArbiterID:27017 due to HostUnreachable: Error connecting to ArbiterID:27017 :: caused by :: Connection refused
result of rs.status() command:
{
"set" : "UHI-PROD",
"date" : ISODate("2023-07-10T10:02:35.295Z"),
"myState" : 1,
"term" : NumberLong(4),
"syncingTo" : "",
"syncSourceHost" : "",
"syncSourceId" : -1,
"heartbeatIntervalMillis" : NumberLong(2000),
"optimes" : {
"lastCommittedOpTime" : {
"ts" : Timestamp(1688983352, 4742),
"t" : NumberLong(4)
},
"readConcernMajorityOpTime" : {
"ts" : Timestamp(1688983352, 4742),
"t" : NumberLong(4)
},
"appliedOpTime" : {
"ts" : Timestamp(1688983352, 4742),
"t" : NumberLong(4)
},
"durableOpTime" : {
"ts" : Timestamp(1688983352, 4742),
"t" : NumberLong(4)
}
},
"lastStableCheckpointTimestamp" : Timestamp(1688983291, 2),
"electionCandidateMetrics" : {
"lastElectionReason" : "electionTimeout",
"lastElectionDate" : ISODate("2023-07-10T07:45:09.406Z"),
"electionTerm" : NumberLong(4),
"lastCommittedOpTimeAtElection" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},
"lastSeenOpTimeAtElection" : {
"ts" : Timestamp(1688943890, 8),
"t" : NumberLong(3)
},
"numVotesNeeded" : 1,
"priorityAtElection" : 1,
"electionTimeoutMillis" : NumberLong(10000),
"newTermStartDate" : ISODate("2023-07-10T07:45:09.413Z"),
"wMajorityWriteAvailabilityDate" : ISODate("2023-07-10T07:45:09.500Z")
},
"members" : [
{
"_id" : 0,
"name" : "***:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 8249,
"optime" : {
"ts" : Timestamp(1688983352, 4742),
"t" : NumberLong(4)
},
"optimeDate" : ISODate("2023-07-10T10:02:32Z"),
"syncingTo" : "",
"syncSourceHost" : "",
"syncSourceId" : -1,
"infoMessage" : "",
"electionTime" : Timestamp(1688975109, 1),
"electionDate" : ISODate("2023-07-10T07:45:09Z"),
"configVersion" : 2,
"self" : true,
"lastHeartbeatMessage" : ""
}
],
"ok" : 1,
"operationTime" : Timestamp(1688983352, 4742),
"$clusterTime" : {
"clusterTime" : Timestamp(1688983352, 4742),
"signature" : {
"hash" : BinData(0,"s7Diwi3gtsej746SjJOzhIkdp/E="),
"keyId" : NumberLong("7195139362714025986")
}
}
}
i increased the limit for open files until we can solve it