0

We have a MongoDb replicaset set up on three nodes, and we have this in both errorlog. This is an example. All of them at same time in the different nodes.

Mar 27 10:31:49

Node 1:

Mar 27 10:31:49 ulpmon01 mongod.27017[1464]: [rsHealthPoll] replSet info ulpmon03.osasunet:27017 is down (or slow to respond):
Mar 27 10:31:49 ulpmon01 mongod.27017[1464]: [rsHealthPoll] replSet member ulpmon03.osasunet:27017 is now in state DOWN
Mar 27 10:31:53 ulpmon01 mongod.27017[1464]: [rsHealthPoll] replSet member ulpmon03.osasunet:27017 is up
Mar 27 10:31:53 ulpmon01 mongod.27017[1464]: [rsHealthPoll] replSet member ulpmon03.osasunet:27017 is now in state SECONDARY

Node 2:

Mar 27 10:31:43 ulpmon02 mongod.27017[1438]: [rsHealthPoll] DBClientCursor::init call() failed
Mar 27 10:31:43 ulpmon02 mongod.27017[1438]: [rsHealthPoll] replSet info ulpmon03.osasunet:27017 is down (or slow to respond):
Mar 27 10:31:43 ulpmon02 mongod.27017[1438]: [rsHealthPoll] replSet member ulpmon03.osasunet:27017 is now in state DOWN
Mar 27 10:31:50 ulpmon02 mongod.27017[1438]: [rsHealthPoll] replset info ulpmon03.osasunet:27017 heartbeat failed, retrying
Mar 27 10:31:53 ulpmon02 mongod.27017[1438]: [rsHealthPoll] replSet member ulpmon03.osasunet:27017 is up
Mar 27 10:31:53 ulpmon02 mongod.27017[1438]: [rsHealthPoll] replSet member ulpmon03.osasunet:27017 is now in state SECONDARY

Node 3:

Mar 27 10:31:53 ulpmon03 mongod.27017[1442]: [rsHealthPoll] replset info ulpmon01.osasunet:27017 thinks that we are down
Mar 27 10:31:53 ulpmon03 mongod.27017[1442]: [rsHealthPoll] replset info ulpmon02.osasunet:27017 thinks that we are down

Can anyone help?

Vince Bowdren
  • 8,326
  • 3
  • 31
  • 56
AER
  • 1
  • Hi AER, and welcome to Stack Overflow. I took the liberty of reformatting part of your question to improve the readability; remember, on this site everybody is encouraged to [edit] and re-edit where necessary to make every question as good as possible. – Vince Bowdren Apr 20 '17 at 08:29

2 Answers2

0

When it says that the third node "is down (or slow to respond)", but you can see from the logs that the third node was running the whole while, then it probably means you have network problems. You should look into your network setup to see if there are connectivity problems between node 3 and the other nodes.

Vince Bowdren
  • 8,326
  • 3
  • 31
  • 56
0

We continue to investigate this issue and we see errors of this type:

Apr 26 01:12:24 ulpmon01 mongod.27017[1464]: [rsBackgroundSync] changing sync target because current sync target's most recent OpTime is Apr 26 01:10:27:1 which is more than 30 seconds behind member ulpmon03.osasunet:27017 whose most recent OpTime is Apr 26 01:12:23:1

Apr 26 15:40:45 ulpmon01 mongod.27017[1464]: [rsBackgroundSync] replset setting syncSourceFeedback to ulpmon02.osasunet:27017 Apr 26 15:40:45 ulpmon01 mongod.27017[1464]: [rsBackgroundSync] changing sync target because current sync target's most recent OpTime is Apr 26 15:40:00:1 which is more than 30 seconds behind member ulpmon03.osasunet:27017 whose most recent OpTime is Apr 26 15:40:44:4

AER
  • 1