1

we faced issue regard replication of MONGO replicSet on production so I rebuild the replica set by deleting the local DB and initate it again with just primary node so production application can be up until we solve it. i deleted the local db from mongo primary node and arbiter , and get the replica set with just one primary node.

but i faced another issue because replicaset still trying to reach the arbiter node eventhough it is not shown in rs.status command , this cause too many open files issue and then DB become down.

i increased the limit for open files , and stopped the arbiter node but still i have logs showing it try to connect to arbiter . how i can solve this until we can rebuild the replication with secondary and arbiter nodes .

logs:

2023-07-10T02:04:55.287+0300 I CONNPOOL [Replication] Dropping all pooled connections to ArbiterID:27017 due to HostUnreac
hable: Error connecting to 10.202.2.48:27017 :: caused by :: Too many open files
2023-07-10T02:04:55.287+0300 E -        [Replication] cannot open /dev/urandom Too many open files
2023-07-10T02:04:55.287+0300 F -        [Replication] Fatal Assertion 28839 at src/mongo/platform/random.cpp 161
2023-07-10T13:01:09.802+0300 I REPL_HB  [replexec-222] Error in heartbeat (requestId: 16678428) to ArbiterID:27017, response status: HostUnreachable: Error connecting to ArbiterID:27017 :: caused by :: Connection refused
2023-07-10T13:01:09.802+0300 I ASIO     [Replication] Connecting to ArbiterID:27017
2023-07-10T13:01:09.802+0300 I ASIO     [Replication] Failed to connect to ArbiterID:27017 - HostUnreachable: Error connecting to ArbiterID:27017 :: caused by :: Connection refused
2023-07-10T13:01:09.802+0300 I CONNPOOL [Replication] Dropping all pooled connections to ArbiterID:27017 due to HostUnreachable: Error connecting to ArbiterID:27017 :: caused by :: Connection refused

result of rs.status() command:

{
        "set" : "UHI-PROD",
        "date" : ISODate("2023-07-10T10:02:35.295Z"),
        "myState" : 1,
        "term" : NumberLong(4),
        "syncingTo" : "",
        "syncSourceHost" : "",
        "syncSourceId" : -1,
        "heartbeatIntervalMillis" : NumberLong(2000),
        "optimes" : {
                "lastCommittedOpTime" : {
                        "ts" : Timestamp(1688983352, 4742),
                        "t" : NumberLong(4)
                },
                "readConcernMajorityOpTime" : {
                        "ts" : Timestamp(1688983352, 4742),
                        "t" : NumberLong(4)
                },
                "appliedOpTime" : {
                        "ts" : Timestamp(1688983352, 4742),
                        "t" : NumberLong(4)
                },
                "durableOpTime" : {
                        "ts" : Timestamp(1688983352, 4742),
                        "t" : NumberLong(4)
                }
        },
        "lastStableCheckpointTimestamp" : Timestamp(1688983291, 2),
        "electionCandidateMetrics" : {
                "lastElectionReason" : "electionTimeout",
                "lastElectionDate" : ISODate("2023-07-10T07:45:09.406Z"),
                "electionTerm" : NumberLong(4),
                "lastCommittedOpTimeAtElection" : {
                        "ts" : Timestamp(0, 0),
                        "t" : NumberLong(-1)
                },
                "lastSeenOpTimeAtElection" : {
                        "ts" : Timestamp(1688943890, 8),
                        "t" : NumberLong(3)
                },
                "numVotesNeeded" : 1,
                "priorityAtElection" : 1,
                "electionTimeoutMillis" : NumberLong(10000),
                "newTermStartDate" : ISODate("2023-07-10T07:45:09.413Z"),
                "wMajorityWriteAvailabilityDate" : ISODate("2023-07-10T07:45:09.500Z")
        },
        "members" : [
                {
                        "_id" : 0,
                        "name" : "***:27017",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 8249,
                        "optime" : {
                                "ts" : Timestamp(1688983352, 4742),
                                "t" : NumberLong(4)
                        },
                        "optimeDate" : ISODate("2023-07-10T10:02:32Z"),
                        "syncingTo" : "",
                        "syncSourceHost" : "",
                        "syncSourceId" : -1,
                        "infoMessage" : "",
                        "electionTime" : Timestamp(1688975109, 1),
                        "electionDate" : ISODate("2023-07-10T07:45:09Z"),
                        "configVersion" : 2,
                        "self" : true,
                        "lastHeartbeatMessage" : ""
                }
        ],
        "ok" : 1,
        "operationTime" : Timestamp(1688983352, 4742),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1688983352, 4742),
                "signature" : {
                        "hash" : BinData(0,"s7Diwi3gtsej746SjJOzhIkdp/E="),
                        "keyId" : NumberLong("7195139362714025986")
                }
        }
}

i increased the limit for open files until we can solve it

0 Answers0