3

In the code i am writing i used to have both threads and multiple processes a fork:

  • threads for a websocket connection (and some other background tasks)
  • multiprocessing fork to create an isolated memory process which can be reloaded

This resulted in a hanging process once in a while. I learned that mixing those two is a bad idea, since the forked process waits for a non-existing thread to give up its lock:

I am thinking of rewriting my code to use asyncio for the background tasks instead, but i am not so sure if this solves my problem, since i am not familiar with how asyncio works under the hood. Does asyncio use locks to perform the context switching? Are all coroutines inherited by the forked process? Could the forked process be stuck somehow?

The linked article suggests a few alternatives to solve this issue, which are not applicable in my case:

  • use spawn instead of fork to not copy the current memory and start with a fresh process
  • fork before the threads are spun of
Matthijs
  • 439
  • 3
  • 16
  • Have you tried telling multiprocessing to not fork, i.e. to exec after fork: `from multiprocessing import set_start_method; set_start_method("spawn")` (mentioned near the bottom of the first article)? As far as I know, that completely solves the issue and no further workarounds are needed. – user4815162342 Aug 25 '20 at 14:24
  • As for asyncio, it can integrate with multiprocessing through `run_in_executor`, which can be passed a `concurrent.futures.ProcessPoolExecutor`. Note that that still uses `multiprocessing` under the hood, and asyncio (and other parts of Python) reserve the right to occasionally use threads as an implementation detail. – user4815162342 Aug 25 '20 at 14:24
  • i have considered using spawn (as mentioned in my question), but i prefer not to go this route since it would require a significant rewrite as the child process uses significant state of the parent. also the shift from threads to asyncio has been an improvement overall, so i am particulary interested to know if i still have to worry about deadlocks. will have a look at your `run_in_executor` suggestion, i am not familiar with it – Matthijs Aug 25 '20 at 17:01
  • If you are actually relying on the _semantics_ of `fork()`, then IMO you're not using `multiprocessing` correctly. I don't think `asyncio` will work any better for that use case, sorry. – user4815162342 Aug 25 '20 at 17:03
  • the reason i am forking is that i need to reload python code upon file changes (similar to django autoreload). the only reliable method i know is by forking the process, importing the code and upon reload discard the child process and fork again. if you know a better way without using processes, please enlighten me. importlib.reload is not a viable option as imports of the form `from ... import ..` will not function correctly. – Matthijs Aug 25 '20 at 17:29
  • and could you clarify why this is "relying on the semantics"? i am specifically interested in keeping the current memory. – Matthijs Aug 25 '20 at 17:30
  • In Python `multiprocessing` normally refers to a [standard library module](https://docs.python.org/3/library/multiprocessing.html) which exposes an interface to multi-process execution that _doesn't_ rely on the semantics of the `fork()` system call. This is in part because it `fork()` is not supported on all platforms where Python is (khm-Windows-khm), but also because of the obscure interactions like the one you discovered. Multiprocessing supports passing objects between processes using serialization or sharing them in shared memory, and provides an array of tools for those purposes. – user4815162342 Aug 25 '20 at 17:50
  • rephrased my question to state explicit need for fork (instead of multiprocessing) – Matthijs Aug 25 '20 at 21:45

0 Answers0