0

I am running a multiprocessing pool, mapped over a number of inputs. My worker processes have an initialization step that spins up a connection to selenium and a database. When the pool finishes its job, what is the graceful way to close these connections rather than just relying on python's memory management and del definitions?

EDIT:

class WebDriver():
  def close():
    //close logic

  def __del__():
    self.driver.close()

def init():
  global DRIVER
  DRIVER=WebDriver()

def shutdown():
  DRIVER.close()

if __name__=='__main__':
  with multiprocessing.Pool(initializer=init) as pool:
    pool.map(some_function, some_args)

Because some_args is large, I only want to call shutdown when the worker processes have no other jobs to do. I don't want to close / reopen connections to my database until everything is done.

As of right now, I would expect the memory manager to call __del__ if the worker process shutsdown, but I don't know if it does occur. I've gotten strange scenarios where it hasn't been called. I'm hoping to better understand how to manage shutdown.

Jason Kang
  • 94
  • 7
  • Do you mean when the pool or processes are closed? They are separate processes, when they are closed all associated resources are released – Iain Shelvington Dec 29 '21 at 04:35
  • I thought that the worker processes are closed only when the pool is finished with its job? If associated resources are released, does that mean things like db connections are automatically closed? I thought you would have to manually close connections? – Jason Kang Dec 29 '21 at 05:15
  • Sorry I interpreted this part of the question "When the pool finishes its job..." as meaning the pool was finished/closed. Yes resources released means all connections would be closed. Can you show some of your code, including what you would consider graceful? – Iain Shelvington Dec 29 '21 at 05:18
  • Yes - I've updated my OP. – Jason Kang Dec 29 '21 at 05:53

1 Answers1

1

I think you have a good chance of closing your drivers if you first wait for your pool processes to terminate and then force a garbage collection:

if __name__=='__main__':
    with multiprocessing.Pool(initializer=init) as pool:
        try:
            pool.map(some_function, some_args)
        finally:
            # Wait for all tasks to complete and all processes to terminate:
            pool.close()
            pool.join()
            # Processes should be done now:
            import gc
            gc.collect() # ensure garbage collection

Solution With User-created Pool

import multiprocessing


class WebDriver():

    def close(self):
        ...
        print('driver is now closed')

    def do_something(self, i):
        import time
        time.sleep(.1)
        print(i, flush=True)

    def __enter__(self):
        self.driver = [] # this would be an actual driver
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.close()



def some_function(i):
    # Do something with DRIVER:
    ...
    DRIVER.do_something(i)

def worker(in_q):
    global DRIVER

    with WebDriver() as DRIVER:
        # Iterate until we get special None record and then cleanup:
        for i in iter(in_q.get, None):
            try:
                some_function(i)
            except BaseException as e:
                pass

if __name__=='__main__':
    POOL_SIZE = multiprocessing.cpu_count()
    # Create pool:
    # Assumption is that we don't need an output queue for output
    in_q = multiprocessing.Queue()
    processes = [multiprocessing.Process(target=worker, args=(in_q,))
                 for _ in range(POOL_SIZE)
                 ]
    for p in processes:
        p.start()
    # Write arguments to input_queue:
    some_args = range(16)
    for arg in some_args:
        in_q.put(arg)
    # Now write POOL_SIZE "quit" messages:
    for _ in range(POOL_SIZE):
        in_q.put(None)
    # Wait for processes to terminate:
    for p in processes:
        p.join()

Prints:

0
1
2
3
4
5
6
7
8
driver is now closed
9
driver is now closed
10
driver is now closed
11
driver is now closed
12
driver is now closed
14
13
driver is now closed
driver is now closed
15
driver is now closed
Booboo
  • 38,656
  • 3
  • 37
  • 60
  • 1
    I believe it's un-pythonic to rely on `__del__` for resource management. I can't use a context manager here, is there any other way I can explicitly deallocate my resource? – Jason Kang Dec 29 '21 at 19:51
  • Also, I've seen that if my pool.map function throws an error, the program exits without cleaning up its resources. It has not called self.driver.close(). Is this expected? – Jason Kang Dec 29 '21 at 20:06
  • The problem is that you need to have the drivers closed when the processes terminate and I don't know of any hooks built into the multiprocessing pool that would be a analog of the *initializer* argument on the `Pool` constructor such as being able to code a *deinitializer* argument that specifies a function to be run in each process at termination time that would call `close` on the global `DRIVER`. And there is know way I see where a context manager can be used. As far as `map` raising an exception, use a `try/finally` with the `map` in the `try` and the next 4 statements in the `finally`. – Booboo Dec 29 '21 at 20:17
  • There is a solution that doesn't use a multirpocessing pool but rather just `multinprocessing.Process` instances and a `multiprocessing.Queue` instance, in essence creating your own pool. But I believe my answer works whether is "pythonic" or not. It is as pythonic as you can get under the circumstances *as far as my knowledge permits*. – Booboo Dec 29 '21 at 20:20
  • What is the point of running `gc.collect`? Isn't the parent process unable to see the subprocess' memory? How will running `gc.collect` clean up our subprocesses? – Jason Kang Dec 29 '21 at 20:24
  • You are right that the `gc.collect` probably does not contribute much since it only runs for the current process but indirectly might help as it delays the final termination and gives the garbage collectors in the other processes a chance to run. of course, just sleeping for a bit would have the same effect. See the update to the answer. – Booboo Dec 29 '21 at 20:46
  • Now you can take the second solution and modify your WebDriver class to be a context manager and use that in `worker`. – Booboo Dec 29 '21 at 20:49
  • 1
    I did it for you. – Booboo Dec 29 '21 at 20:56
  • As far as the `gc.collect` is concerned, I had copied this code from a [similar problem that used multithreading](https://stackoverflow.com/questions/70447389/python-multiprocessing-a-class#70450297) in which case it made perfect sense. That, btw, is a solution for a multithreading pool – Booboo Dec 29 '21 at 21:05
  • Interesting, this works. I think what's happening is that the `__exit__` definition for pool doesn't implicitly do any `.join()` on the worker processes, so we need to explicitly call `.join()` if the processes exit before finishing. This is an unexpected result but a [known issue](https://stackoverflow.com/questions/55035333/use-python-pool-with-context-manager-or-close-and-join). – Jason Kang Dec 30 '21 at 00:08