I have the following code:
import numpy as np
from multiprocess import Pool
data = np.zeros((50,50))
def foo():
# data = np.zeros((50,50)) # This slows the code.
def bar():
data.shape
with Pool() as pool:
async_results = [pool.apply_async(bar) for x in range(20000)]
out = [async_result.get() for async_result in async_results]
foo()
As written, it takes 3 seconds to run. But when I uncomment the first line of foo()
, the code takes 10 seconds.
Commenting out the initial definition of data
doesn't fix the issue. So I think the bottleneck isn't when data
is initialized. I suspect the problem is passing data
to each of the processes, but I can't confirm this. And I don't know why defining data
outside of foo
would help.
Why is there a discrepancy in speeds?