3

I have wrapped a small C library in Cython, and can successfully call it from Python. Here's a simplified example representing how this is as of now:

# wrapper_module.pyx

cdef extern from "my_c_library.h":
    void function1()
    int function2(int param1, char *param2)

class MyWrapperClass():
    def __init__(self):
        pass
    def do_func1(self):
        function1()
    def do_func2(self, p1, p2):
        function2(p1, p2)

This all works well. My goal now is to create and use an instance of MyWrapperClass in a separate process, like this:

# my_script.py

import multiprocessing as mp
from wrapper_module import MyWrapperClass

class MyProcess(mp.Process):
    def __init__(self):
        super().__init__()

    def run(self):
        self.mwc = MyWrapperClass()
        self.mwc.do_func1()
        # ... do more

if __name__ == '__main__':
    proc = MyProcess()
    proc.start()

When I run my_script.py, I get the following error:

The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec(). Break on THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC() to debug.

I think I understand generally why it does not allow this cython module to be forked into a different process (due to resources that the underlying C library uses). But generally when I have encountered this type of problem in the past, either in pure Python or using ctypes and calling DLLs, I can solve it by placing all the critical code in the run method of MyProcess, causing it only to be initialized in the newly forked process.

In this Cython case, however, I don't know how to include the cdef extern code only in the forked process, to avoid this error. Any suggestions?

(I am running python 3.6.2 on macOS 10.12)

benson
  • 423
  • 3
  • 11
  • 1
    I cannot test this as the wrapper is not available but importing `MyWrapperClass` only in the `run` block of the new process should let it continue. If I understand correctly, the C library does not allow itself to be used in forked processes. Since the shared library is only loaded on `import` in python, importing it only in the newly spawned process but not its parent should be ok. – danny Sep 28 '17 at 15:35
  • Yes @danny exactly, so I had really hoped that would work, but for some reason it still throws that same error. That is, if I put the `from wrapper_module import MyWrapperClass` into the `MyProcess.run` method, it still has the same issue. Do you know if there's something unique about processes spawned by `multiprocessing` that is inherently different than the main process? Otherwise I cannot imagine how the process would even know to throw this error. – benson Sep 28 '17 at 15:46
  • They are processes that have a parent PID that is not 1 (the init process). In other words spawned by another process. But that applies to all processes spawned by any other process, like the shell. But I think what is happening is that the `import` statement causes it to be loaded in both the parent and new process. You could though run two separate python processes and connect them via IPC, meaning not spawn a process in python code at all but use two python scripts, one importing `MyWrapper`, one not, and a shared multiprocessing pipe/memory for communication between them. – danny Sep 28 '17 at 15:51
  • Hm I see what you mean, except I'm not sure how I could implement that in the context of a package...the goal is for one to be able to simply import MyProcess from the package and use it. – benson Sep 28 '17 at 15:56
  • I do wonder whether this has anything to do with the fundamental issue: https://stackoverflow.com/questions/15991036/why-cant-i-use-cocoa-frameworks-in-different-forked-processes – benson Sep 28 '17 at 15:56
  • There's no rule saying that you cannot spawn a python interpreter in a new multiprocessing process (other than that usually you do not need to). So that is an option. Eg, make the `run` be `python ` and communicate with it via IPC. But I wonder if this is even necessary. You have a C library and presumably releasing the GIL. Is `threading` not an option? Or does that not work either. – danny Sep 28 '17 at 16:02
  • Thanks @danny I suppose I should try a threading-based solution. I was avoiding this because I'm trying to keep as much of this as possible in python. But that does seem like a possible workaround – benson Sep 28 '17 at 16:17

1 Answers1

0

Codes in cdef extern are just delarations for external c functions(Let's only consider c), these delarations are put into the generated c file without any changes(almost), it has nothing to do with initializations.

I also made a rather simple test(py3.6 windows):

lib.c:

int add(int a, int b)
{
    return a + b ;
}

mylib.pyx:

cdef extern from "lib.c":
    int _add "add"(int a, int b)

cdef class WrapperClass:
    def add(self, a, b):
        return _add(a, b)

test_lib.py:

from mylib import WrapperClass
import multiprocessing as mp

class MyProcess(mp.Process):
    def __init__(self):
        super().__init__()

    def run(self):
        self.mwc = WrapperClass()
        print(self.mwc.add(1, 2))


if __name__ == "__main__":
    proc = MyProcess()
    proc.start()

run python test_lib.py without any problem. Maybe my c code is too simple, but I have no idea what your c functions realy do, myabe that's the real problem.

oz1
  • 938
  • 7
  • 18
  • Yeah but this is not the scenario I'm describing, as it does not raise the forked process error. As I mentioned, this error has to do with single-process restrictions in the underlying C library. Your example C code has no such restrictions and thus doesn't raise that error. When you import mylib, it runs that code in the main process, and it is able to fork it off when you start the new process. But that is not always the case-in fact it's not the case any time a non-shareable resource is used. In those cases, the relevant C code need be instantiated *only* in the run method of the process. – benson Sep 28 '17 at 13:33