I am new to parallelization in general and concurrent.futures in particular. I want to benchmark my script and compare the differences between using threads and processes, but I found that I couldn't even get that running because when using ProcessPoolExecutor
I cannot use my global variables.
The following code will output Hello
as I expect, but when you change ThreadPoolExecutor
for ProcessPoolExecutor
, it will output None
.
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
greeting = None
def process():
print(greeting)
return None
def main():
with ThreadPoolExecutor(max_workers=1) as executor:
executor.submit(process)
return None
def init():
global greeting
greeting = 'Hello'
return None
if __name__ == '__main__':
init()
main()
I don't understand why this is the case. In my real program, init is used to set the global variables to CLI arguments, and there are a lot of them. Hence, passing them as arguments does not seem recommended. So how do I pass those global variables to each process/thread correctly?
I know that I can change things around, which will work, but I don't understand why. E.g. the following works for both Executors, but it also means that the globals initialisation has to happen for every instance.
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
greeting = None
def init():
global greeting
greeting = 'Hello'
return None
def main():
with ThreadPoolExecutor(max_workers=1) as executor:
executor.submit(process)
return None
def process():
init()
print(greeting)
return None
if __name__ == '__main__':
main()
So my main question is, what is actually happening. Why does this code work with threads and not with processes? And, how do I correctly pass set globals to each process/thread without having to re-initialise them for every instance?
(Side note: because I have read that concurrent.futures might behave differently on Windows, I have to note that I am running Python 3.6 on Windows 10 64 bit.)