I am using numba's @njit
decorator to compile a function that gets used in parallel processes, but it is slower than I expected. Processes that should differ an order of magnitude in execution time take around the same time, which makes it look like there's a lot of compilation overhead.
I have a function
@njit
def foo(ar):
(do something)
return ar
and a normal python function
def bar(x):
(do something)
return foo(x)
which gets called in parallel processes like
if __name__=="__main__":
with concurrent.futures.ProcessPoolExecutor(max_workers=maxWorkers) as executor:
results = executor.map(bar, args)
Where args is a long list of arguments.
Does this mean that foo()
gets compiled separately within each process? That would explain the extra overhead. Is there a good solution for this? I could just call foo()
once on one of the arguments before spawning the processes, forcing it to compile ahead of time. Is there a better way?