0

I'm getting some unexpected behaviour when connecting to MongoDB Atlas through docker (ECS Fargate) and a NAT Gateway.

I have three gateways with assigned Elastic IPs. Those IPs are whitelisted at MongoDB Atlas and there are not troubles with the initial connections from the docker instances running our applications. Everything works as expected for a seemingly random amount of time (sometimes up to hours).

However, eventually one or more of our containers fails to connect to the database with the following error:

MongooseServerSelectionError: connection <monitor> to 13.49.31.245:<omitted> timed out
    at Function.Model.$wrapCallback (/app/node_modules/mongoose/lib/model.js:5095:32)
    at /app/node_modules/mongoose/lib/query.js:4510:21
    at /app/node_modules/mongoose/lib/helpers/promiseOrCallback.js:32:5
    at new Promise (<anonymous>)
    at promiseOrCallback (/app/node_modules/mongoose/lib/helpers/promiseOrCallback.js:31:10)
    at model.Query.exec (/app/node_modules/mongoose/lib/query.js:4509:10)
    at model.Query.Query.exec (/app/node_modules/mongoose-deep-populate/lib/plugin.js:50:22)
    at model.Query.Query.catch (/app/node_modules/mongoose/lib/query.js:4608:15)
    at <omitted> (<omitted>:721:11)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async <omitted> (<omitted>:37:21)
    at async <omitted>:76:21 {

    reason: TopologyDescription {
        type: 'ReplicaSetNoPrimary',
        setName: '<omitted>',
        maxSetVersion: 2,
        maxElectionId: 7fffffff0000000000000039,
        servers: [Map],
        stale: false,
        compatible: true,
        compatibilityError: null,
        logicalSessionTimeoutMinutes: 30,
        heartbeatFrequencyMS: 10000,
        localThresholdMS: 15,
        commonWireVersion: 17
    }
}

The stated IP address (13.49.31.245) is not one of our Elastic IPs and as far as I can see our AWS account has no association with this IP address. It does however come from the same region our applications are hosted in. Why is this unknown IP address used even though we have assigned Elastic IPs?

Patrik Bäckström
  • 117
  • 1
  • 1
  • 6
  • I would suspect that you have accidentally put some containers into e.g. public or otherwise differently configured subnets. Or do existing and running containers suddenly fail vs. only newly started containers? – luk2302 Jun 19 '23 at 07:31
  • All the containers run in the private subnets they are intended to and none of them has their own public ip. So they cannot communicate with our database without one of the NATs. I have verified that the Fargate services can only create tasks in the private subnets. I did reload all my containers after a big update this weekend and since then they have had this issue, and I guess we can consider them all to be newly started (but again, they can make the initial connection and function for a very long time before breaking). – Patrik Bäckström Jun 19 '23 at 10:55
  • 1
    Is that error message coming from the client or server? It appears me to be a client message, and indicates that the _server_ address is `13.49.31.245`, so should _not_ be an Elastic IP address. Instead, I would look in my DNS and/or application configuration, – kdgregory Jun 19 '23 at 12:30
  • The error is logged in a node/express application within docker connecting through mongoose. However, the database cluster (MongoDB Atlas) is hosted in the region the "unknown" ip address points to. So you're probably correct that the IP is the mongo server. I'll open a support ticket with MongoDB. – Patrik Bäckström Jun 19 '23 at 12:53

1 Answers1

0

It seems that bad error handling was the issue all along.

I recently implemented a graceful shutdown procedure to properly clear up MongoDB connections. This procedure was also called when MongoDB didn't respond in about 30 seconds. The shutdown procedure then tried to close connections and simply failed because, well... It had no connection.

The procedure stalled in an awaited state but the API still accepted incoming requests which failed when connecting to MongoDB.

I have corrected the shutdown procedure and no longer receive the error above.

Thanks @kdgregory for pointing me in the right direction.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Patrik Bäckström
  • 117
  • 1
  • 1
  • 6