0

I have a big code that take a while to make calculation, I have decided to learn about multithreading and multiprocessing because only 20% of my processor was being used to make the calculation. After not having any improvement with multithreading, I have decided to try multiprocessing and whenever I try to use it, it just show a lot of errors even on a very simple code.

this is the code that I tested after starting having problems with my big calculation heavy code :

from concurrent.futures import ProcessPoolExecutor

def func():
    print("done")

def func_():
    print("done")

def main():
    executor = ProcessPoolExecutor(max_workers=3)

    p1 = executor.submit(func)
    p2 = executor.submit(func_)

main()

and in the error message that I amhaving it says

An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

this is not the whole message because it is very big but I think that I may be helpful in order to help me. Pretty much everything else on the error message is just like "error at line ... in ..."

If it may be helpful the big code is at : https://github.com/nobody48sheldor/fuseeinator2.0 it might not be the latest version.

  • You are running on Windows? How do you call `main()`? That's where the problem is. On Windows, multiprocessing starts a new python and then imports your module. Code at module level runs a second time. If you call `main()`, it gets called again in the subprocess and your whole program re-executes. Python detected that and gave the warning instead. Stuff inside `if __name__ == "__main__":` is not executed on mere importing. So stick your top level script code inside there and it works. – tdelaney Apr 19 '21 at 01:39
  • yes, I m running on windows. But I don't understand why would my code run 2 times because the function is just being called at the end, and even if I don't make a main function for the multiprocessing stuffs it still show the same message – nobody48sheldor Apr 19 '21 at 01:48
  • But where do you call `main()`? Can you add that to the example code to make it runnable? See _Safe importing of main module_ in [The spawn and forkserver start methods](https://docs.python.org/3/library/multiprocessing.html#the-spawn-and-forkserver-start-methods) - Windows spawns. – tdelaney Apr 19 '21 at 02:14
  • Ah my bad I forgot to past it but I calles main() at the end. And btw I am not using the multiprocessing library but the concurrent.futures I knows it is very similar but the way to add it is not the same, and in my big code I have to return lists so I prefer to use concurrent.futures – nobody48sheldor Apr 19 '21 at 02:28

1 Answers1

1

I updated your code to show main being called. This is an issue with spawning operating systems like Windows. To test on my linux machine I had to add a bit of code. But this crashes on my machine:

# Test code to make linux spawn like Windows and generate error. This code 
# # is not needed on windows.
if __name__ == "__main__":
    import multiprocessing as mp
    mp.freeze_support()
    mp.set_start_method('spawn')

# test script
from concurrent.futures import ProcessPoolExecutor

def func():
    print("done")

def func_():
    print("done")

def main():
    executor = ProcessPoolExecutor(max_workers=3)
    p1 = executor.submit(func)
    p2 = executor.submit(func_)

main()

In a spawning system, python can't just fork into a new execution context. Instead, it runs a new instance of the python interpreter, imports the module and pickles/unpickles enough state to make a child execution environment. This can be a very heavy operation.

But your script is not import safe. Since main() is called at module level, the import in the child would run main again. That would create a grandchild subprocess which runs main again (and etc until you hang your machine). Python detects this infinite loop and displays the message instead.

Top level scripts are always called "__main__". Put all of the code that should only be run once at the script level inside an if. If the module is imported, nothing harmful is run.

if __name__ == "__main__":
    main()

and the script will work.

There are code analyzers out there that import modules to extract doc strings, or other useful stuff. Your code shouldn't fire the missiles just because some tool did an import.

Another way to solve the problem is to move everything multiprocessing related out of the script and into a module. Suppose I had a module with your code in it

whatever.py

from concurrent.futures import ProcessPoolExecutor

def func():
    print("done")

def func_():
    print("done")

def main():
    executor = ProcessPoolExecutor(max_workers=3)

    p1 = executor.submit(func)
    p2 = executor.submit(func_)

myscript.py

#!/usr/bin/env pythnon3
import whatever
whatever.main()

Now, since the pool is laready in an imported module that doesn't do this crazy restart-itself thing, no if __name__ == "__main__": is necessary. Its a good idea to put it in myscript.py anyway, but not required.

tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • 1
    No, no, no! You still need `if __name__ == '__main__': whatever.main()`. – Booboo Apr 19 '21 at 10:55
  • thanks it works! but I have to plot the whole thing in the main() function, but in my big code I make a figure with a lot of plots, how can I make the plotting outside main() without making a circular import ? – nobody48sheldor Apr 19 '21 at 14:02
  • 1
    @nobody48sheldor - you can put them in another function that is called from main. The trick is to make sure that there is no path to the code that runs merely from import. – tdelaney Apr 19 '21 at 21:51
  • I like the second solution generally where code that does stuff is separated into importable modules while top level scripts that are called to start a python program are relatively small and concerned with command line parsing, etc. That way, the bulk of the code is more easity unit tested. – tdelaney Apr 19 '21 at 21:53
  • thanks you all ! I finally get it working ! – nobody48sheldor Apr 20 '21 at 00:20