0

I have an embarrassingly parallel loop:

# Definitions

def exhaustiveExplorationsWithSimilarityAll(inputFolder, outputFolder, similarityMeasure):
    phasesSpeedupDictFolder=parsePhasesSpeedupDictFolder(inputFolder)
    avgSpeedupProgramDict=computeAvgSpeedupProgram(phasesSpeedupDictFolder)
    parameters={
        PROGRAMSPHASESSPEEDUPDICTS:phasesSpeedupDictFolder,
        PROGRAMSAVGSPEEDUPDICT:avgSpeedupProgramDict
    }
similarityHandler= SimilarityHandler(similarityMeasure,parameters)



# Sequential running

for fileName in os.listdir(inputFolder):
    print fileName
    exhaustiveExplorationsWithSimilarity(inputFolder + fileName, outputFolder + fileName, similarityHandler)

and I would like to make it parallel using Joblib Parallel:

# Parallel version

num_cores = multiprocessing.cpu_count()

parallel= Parallel(n_jobs=num_cores)
    for fileName in os.listdir(inputFolder):
        print fileName
        parallel(delayed(exhaustiveExplorationsWithSimilarity(inputFolder + fileName, outputFolder + fileName, similarityHandler)))

OR other version:

arg_generator = ((inputFolder + fileName, outputFolder + fileName, similarityHandler) for fileName in os.listdir(inputFolder))
parallel(delayed(exhaustiveExplorationsWithSimilarity)(arg_generator))

But upon running it complaints with :

parallel(delayed(exhaustiveExplorationsWithSimilarity(inputFolder + fileName, outputFolder + fileName, similarityHandler)))
  File "/usr/lib/pymodules/python2.7/joblib/parallel.py", line 516, in __call__
    for function, args, kwargs in iterable:
TypeError: 'function' object is not iterable

What am I missing here? Any help is appreciated.

Amir
  • 1,348
  • 3
  • 21
  • 44
  • 1
    Hi Amir, it looks like this program has lots of indentation issues. Can you please fix those so that we can understand what the program is actually doing? – 2ps Jul 04 '16 at 15:51
  • It looks like you are still actually *calling* `exhaustiveExplorationsWithSimilarity` inside your loop, and then passing the *result* of that to delayed. Probably you need to just pass the function and the arguments to delayed? – Tom Dalton Jul 04 '16 at 16:08
  • Let me have a try with your answer. Yes you are right – Amir Jul 04 '16 at 16:29

1 Answers1

1

You are still calling exhaustiveExplorationsWithSimilarity (serially) inside your loop, but then you are passing the result to delayed

From the docs https://pythonhosted.org/joblib/parallel.html#common-usage, it looks like you need to do something like:

parallel = Parallel(n_jobs=num_cores)
parallel(delayed(exhaustiveExplorationsWithSimilarity)(inputFolder + fileName, outputFolder + fileName, similarityHandler) for fileName in os.listdir(inputFolder))
Tom Dalton
  • 6,122
  • 24
  • 35
  • The same error: `File "/usr/lib/pymodules/python2.7/joblib/parallel.py", line 516, in __call__ for function, args, kwargs in iterable: TypeError: 'function' object is not iterable` – Amir Jul 04 '16 at 16:45
  • I guess we should modify the type of `fileName` to be iterable. right? – Amir Jul 04 '16 at 16:46
  • Not 100% what you mean by that, so... maybe! – Tom Dalton Jul 04 '16 at 21:22
  • so any suggestions on why I keep receiving this error ? – Amir Jul 04 '16 at 22:55
  • Yes, this is working now. Can you elaborate more on the change? and can you mention how am I gonna reuse the the same pool for more commands, say on every for iterations I would like to `print fileName` to see the progress – Amir Jul 05 '16 at 11:49
  • 1
    I don't really know much about `joblib`, I've tended to use the `multiprocessing` module directly. I have to say I don't really like the syntax of Parallel/delayed, I've not seen it before. Since you're wanting to do more stuff in the 'tasks', the way I would do that would be to either look at using `multiprocessing` directly, or perhaps `Celery`. – Tom Dalton Jul 06 '16 at 10:52