I'm trying to build an scalable ReplicaSet via Apache Mesos and Marathon. Therefore, I created a Docker file which contains MongoDB 3.0.7 and a Node.js application which registers itself to the eventSubscriptions
API of marathon, meaning that it reacts on events from Marathon.
Those are filtered by application, for example triggering a ReplicaSet init when the first node is up, and adding members to the ReplicaSet when the next nodes come up too.
Initialization works flawlessly, but when I try to add the next nodes to the ReplicaSet, MongoDB reports an error:
2015-10-22T08:52:58.639+0000 I REPL [conn18] replSetReconfig admin command received from client
2015-10-22T08:52:58.641+0000 W NETWORK [conn18] Failed to connect to 192.168.200.167:31069, reason: errno:111 Connection refused
2015-10-22T08:52:58.641+0000 I REPL [conn18] replSetReconfig config object with 2 members parses ok
2015-10-22T08:52:58.641+0000 W NETWORK [ReplExecNetThread-0] Failed to connect to 192.168.200.167:31069, reason: errno:111 Connection refused
2015-10-22T08:52:58.691+0000 W REPL [ReplicationExecutor] Failed to complete heartbeat request to 192.168.200.167:31069; Location18915 Failed attempt to connect to 192.168.200.167:31069; couldn't connect to server 192.168.200.167:31069 (192.168.200.167), connection attempt failed
2015-10-22T08:52:58.691+0000 E REPL [conn18] replSetReconfig failed; NodeNotFound Quorum check failed because not enough voting nodes responded; required 2 but only the following 1 voting nodes responded: 192.168.200.168:31970; the following nodes did not respond affirmatively: 192.168.200.167:31069 failed with Failed attempt to connect to 192.168.200.167:31069; couldn't connect to server 192.168.200.167:31069 (192.168.200.167), connection attempt failed
I tried to verify if the connectivity works, and I can successfully connect to the given connection info:
$mongo --host 192.168.200.167 --port 31069
MongoDB shell version: 3.0.7
connecting to: 192.168.200.167:31069/test
Server has startup warnings:
2015-10-22T08:52:59.212+0000 I CONTROL [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
>
So, for me the connectivity seems to be there. Next thing I checked if the new ReplicaSet configuration I create for reconfiguring the ReplicaSet works:
{
"_id": "rs0",
"version": 2,
"members": [
{
"_id": 0,
"host": "192.168.200.168:31970",
"arbiterOnly": false,
"buildIndexes": true,
"hidden": false,
"priority": 1,
"tags": {},
"slaveDelay": 0,
"votes": 1
},
{
"_id": 1,
"host": "192.168.200.167:31069"
}
],
"settings": {
"chainingAllowed": true,
"heartbeatTimeoutSecs": 30,
"getLastErrorModes": {},
"getLastErrorDefaults": {
"w": 1,
"wtimeout": 0
}
}
}
This config is executed by issueing
db.admin().command({replSetReconfig: myConfig}, function(error, newConfigResult) { ... });
This configuration triggers the above error in the MongoDB logs, and the following one in the Node.js application log:
{
"name": "MongoError",
"message": "Quorum check failed because not enough voting nodes responded; required 2 but only the following 1 voting nodes responded: 192.168.200.168:31970; the following nodes did not respond affirmatively: 192.168.200.167:31069 failed with Failed attempt to connect to 192.168.200.167:31069; couldn't connect to server 192.168.200.167:31069 (192.168.200.167), connection attempt failed",
"ok": 0,
"errmsg": "Quorum check failed because not enough voting nodes responded; required 2 but only the following 1 voting nodes responded: 192.168.200.168:31970; the following nodes did not respond affirmatively: 192.168.200.167:31069 failed with Failed attempt to connect to 192.168.200.167:31069; couldn't connect to server 192.168.200.167:31069 (192.168.200.167), connection attempt failed",
"code": 74
}
Now, the event stranger thing is if I use the configuration and run it directly on the primary's MongoDB shell via
db.runCommand({replSetReconfig: myConfigFromAbove});
it also works... Does someone have an idea what could be the actual problem? Thanks a lot in advance!