7

I'm using joblib to parallelize my python 3.5 code.
If I do:

from modules import f
from joblib  import Parallel, delayed

if __name__ == '__main__':
    Parallel( n_jobs  =  n_jobs,backend = "multiprocessing")(delayed(f)(i) for i in range( 10 ))

code doesn't work. Instead:

from joblib import Parallel, delayed

def f( i ):
    # my func ...

if __name__ == '__main__':
    Parallel( n_jobs  =  n_jobs, backend = "multiprocessing")(delayed(f)(i) for i in range(10))

This works!

Can someone explain why I have to put all my functions in the same script?

That is really unpractical, because in modules there are plenty of functions that I coded, that I don't want to copy / paste in the main script.

Grg
  • 71
  • 3
  • 1
    Welcome. Please, would you be so kind and mark your code as such, in orde to make it readable. – rocksteady Oct 19 '17 at 11:01
  • 2
    Can you please post more information why your code is not working? Error messages, observed behaviour etc. – Nils Werner Oct 19 '17 at 11:20
  • .. I don't have an error, the code start but when it arrives at "parallel" it stop working as if it was entered in an infite loop. I thought it might be looking for the functions defined in "modules" but "modules" is in the same folder as the main script. So it should be alble to find them. Also, the non parallel version works well. – Grg Oct 20 '17 at 14:39
  • This is similar to the issue reported here https://github.com/joblib/joblib/issues/911 however I cannot reproduce it myself. Could someone experiencing this issue please report the full traceback information if you can reproduce it with joblib 0.14.0 or later? – ogrisel Dec 08 '19 at 18:09

2 Answers2

3

I faced the similar ussue. When I call function from import, it just freezes and when I call local function it works OK. Solve it by using multithreading instead of multiprocessing like that

Parallel( n_jobs  =  n_jobs, backend='threading')(delayed(f)(i) for i in range(10))
  • 1
    Note that this is not a good solution for CPU bound work that doesn't release the GIL since all threads share the same instance of the GIL. – kabla002 Aug 27 '19 at 14:58
1

I found a workaround that allows you to keep the helper functions in separates module. For each imported function that you want to parallelize, define a proxy function in your main module, e.g. as

def f_proxy(*args, **kwargs):
    return f(*args, **kwargs)

and simply use delayed(f_proxy). It is still somewhat unsatisfactory, but cleaner than moving all helper functions into the main module.

Ben JW
  • 1,370
  • 1
  • 10
  • 11