0

I'm currently working on distributed computing. My workers returns its results by inserting it into a mongoDB database.The code works well,but the connection remains opened and at a moment my system run out of sockets. Here is my worker code:

def worker(elt):
    client=pymongo.MongoClient(MONGODB_URI)
    db = client.get_default_database()
    essaiElt = db['essaiElt']
    #compute here
    essaiElt.insert( elt.toDict())
    client.close()

By using this command "netstat -anbo" I can see all sockets still opened (more than 3000), the max number of worker is 14 but they have to deal with more than 10 000 task.

...
TCP 10.130.151.11:4999 10.130.137.128:27017 En attente 0
TCP 10.130.151.11:5000 10.130.137.128:27017 En attente 0

I've tried to set timeouts but it doesn't have any effect.

How can I close sockets without restart my dataBase?

Python 2.7.12 Pymongo 3.3 mongoDB 3.2.10

  • How long does the "compute here" section take to execute, please? Does a single Python process insert many documents into the database, or only one, before the process exits? – A. Jesse Jiryu Davis Dec 07 '16 at 15:18
  • How long does the "compute here" section take to execute, please? Actually it is empty. Does a single Python process insert many documents into the database, or only one, before the process exits? The worker is managed be the pp library(parallel python) So it get a task, create a socket, insert the element,close the socket and grab another task to do the same again. So one worker eventualy create a lot of socket. – YOLO SWAG SHOT Dec 07 '16 at 16:14

1 Answers1

0

What's likely happening is, you create a client, insert a document, and close the client, many times per second. A MongoClient can take a second or two to complete its shutdown process. (A MongoClient starts a background thread per server, and these threads don't exit instantly.) Even once the MongoClient has completely closed its sockets, the MongoDB server takes seconds to clean up all resources related to the TCP connection, and the OS's network layer takes minutes to clean up. (See the TIME-WAIT state in Wikipedia's TCP entry.)

Generally, you should create one MongoClient at the beginning of your Python process, and use the one MongoClient throughout that Python process lifetime:

client = pymongo.MongoClient(MONGODB_URI)

def worker(elt):    
    db = client.get_default_database()
    essaiElt = db['essaiElt']
    #compute here
    essaiElt.insert( elt.toDict())

Don't create a new MongoClient per operation. Never close it.

See also the PyMongo FAQ:

Create this client once for each process, and reuse it for all operations. It is a common mistake to create a new client for each request, which is very inefficient.

A. Jesse Jiryu Davis
  • 23,641
  • 4
  • 57
  • 70
  • "Never close it." — When does it get closed? – A. Vidor Apr 12 '17 at 20:16
  • 1
    It's closed automatically when your Python process exits. There is no reason to close a MongoClient before then. – A. Jesse Jiryu Davis Apr 13 '17 at 02:54
  • That is valuable information! Are you saying that the `pymongo` devs expose the `.close` method out of thoroughness, or is "never" an exaggeration and it has legitimate use cases? – A. Vidor Apr 14 '17 at 15:16
  • 2
    I don't know why we ever exposed the "close" method, please just forget it exists. =) – A. Jesse Jiryu Davis Apr 14 '17 at 19:37
  • Not seeing a reason doesn't negate others needing it. I'm working on a process that creates a lab which creates a connection to mongo. When the lab is done, I want the connection to be done too. Without redesigning the fundamental way the lab uses pre-existing components (including a mongo connector) -- CLOSE() when one lab is done so I can run the next is what I need and want to use, and it doesn't work. I should ALWAYS be ABLE to clean up any resource I create without closing my interpreter down. – Penumbra Sep 03 '19 at 20:57