20

While researching some strange issues with my Python web application (in particular, issues regarding MongoDB connectivity), I noticed something on the official PyMongo documentation page. My web application uses Flask, but this shouldn't influence the issue I'm facing.

The PyMongo driver does connection pooling, but it also throws an exception (AutoReconnect) when a connection is stale and a reconnect is due.

It states that (regarding the AutoReconnect exception):

In order to auto-reconnect you must handle this exception, recognizing that the operation which caused it has not necessarily succeeded. Future operations will attempt to open a new connection to the database (and will continue to raise this exception until the first successful connection is made).

I have noticed that this actually happens constantly (and it doesn't seem to be an error). Connections are closed by the MongoDB server after what seems like several minutes of inactivity, and need to be recreated by the web application.

What I don't understand it why the PyMongo driver throws an error when it reconnects (which the user of the driver needs to handle themselves), instead of doing it transparently. (There could even be an option a user could set so that AutoReconnect exceptions do get thrown, but wouldn't a sensible default be that these exceptions don't get thrown at all, and the connections are recreated seamlessly?)

I have never encountered this behavior using other database systems, which is why I'm a bit confused.

It's also worth mentioning that my web application's MongoDB connections never fail when connecting to my local development MongoDB server (I assume it would have something to do with the fact that it's a local connection, and that the connection is done through a UNIX socket instead of a network socket, but I could be wrong).

Andrei Bârsan
  • 3,473
  • 2
  • 22
  • 46
  • 1
    The Python driver developers have addressed Autoreconnect a few times in JIRA issues. Take a look at [PYTHON-197](https://jira.mongodb.org/browse/PYTHON-197), for starters. If you read over what's already in JIRA about Autoreconnect and aren't satisfied, I'd open a PYTHON ticket. – wdberkeley Mar 02 '15 at 15:40
  • Thanks for pointing that out! It feels as if non-powerusers such as myself, who don't operate an entire replicated mongo cluster have been left out. Do you have any idea why those connections actually time out? Shouldn't they fail right away if the socket gets closed? – Andrei Bârsan Mar 02 '15 at 16:08
  • 1
    I couldn't say anything about why the connections are failing without a lot more information. Is there anything in the mongod logs about connections being closed? – wdberkeley Mar 02 '15 at 16:13
  • Nope. All I see are `authenticate db` events (which seem to be associated with new connections) and the regular queries. Secondly--would there be a way to differentiate between a timeout (i.e. a bad connection) and an actual long-running (1-5s) query? – Andrei Bârsan Mar 02 '15 at 16:21

1 Answers1

25

You're misunderstanding AutoReconnect. It is raised when the driver attempts to communicate with the server (to send a command or other operation) and a network failure or similar problem occurs. The name of the exception is meant to communicate that you do not have to create a new instance of MongoClient, the existing client will attempt to reconnect automatically when your application tries the next operation. If the same problem occurs, AutoReconnect is raised again.

I suspect the reason you are seeing sockets timeout (and AutoReconnect being raised) is that there is a load balancer between the server and your application that closes connections after some period of inactivity. For example, this apparently happens on Microsoft's Azure platform after 13 minutes of no activity on a socket. You might be able to fix this by using the socketKeepAlive option, added in PyMongo 2.8. Note that you will also have to set the keepalive interval on your application server to an appropriate value (the default on Linux is 2 hours). See here for more information.

Bernie Hackett
  • 8,749
  • 1
  • 27
  • 20
  • 2
    That was precisely the issue (it was Azure-specific). We're on AWS now and I haven't seen this issue any more. Thanks for the info! – Andrei Bârsan Apr 06 '15 at 07:04
  • I think your answer makes me think about an issue I have with long background jobs that relay on Mongodb, or Rabbitmq connections get lost for no reason this issue only exists on my docker swarm environment and not locally when I run standalone containers, the only difference I see is the overlay network and how it would effect the connections between different containers, thank you so much, now I have something to start with. – Iliyass Hamza Sep 11 '20 at 10:27
  • @Bernie Hackett what happens if you have multiple hosts in the MongoClient? – Mark Nov 12 '21 at 13:14