I am using iPython for some relatively heavy numerical tasks, subsets of which are more or less embarrassingly parallel. The tasks have very simple dependencies, but I'm struggling to work out the best way to implement them. The basic problem is that the result of a previous computation must be used in the following one, and I would like to submit those tasks to the engines separately.
Basically I've got
in0a = ....
in0b = ....
res1a = f1(in0a) ## expensive, would like to run on engine 0
res1b = f1(in0b) ## expensive, would like to run on engine 1
### and same for c, d, ... on engines 2, 3, ... (mod the number of engines)
res2a = f2(res1a) ### depends on res1a = f1(in0a) being computed
res2b = f2(res1b) ### depends on res1b = f1(in0b) being computed
I could restructure things into some f_12()
functions which call f1
and f2
in sequence, and return both outputs as a tuple (I'd like the main engine to have access to all the results) and just submit those asynchronously, or I could use a parallel map of f1
on [in0a, in0b, ...]
but I would strongly prefer not to do either of those refactorings.
So what I really want to know is how I can use view.apply_async()
so that running res2a=f2(res1a)
will only happen once res1a=f1(in0a)
has run (and similarly for the b, c, d, ... tasks).
Basically, I want to use a blocking apply_async
. With load-balancing it should be something like
res1a = v.apply_async(f1, in0a)
res2a = v.apply_async(f2, res1a)
But how do I make the second explicitly depend on the first? Do I need a with v.temp_flags(follow=res1a)
? But then do I need to use res1a.get()
in the call? Will that just block submission until it returns?
Or, how would I do this with a direct view? If I just submit all of the 'a' tasks to the same engine, but use v.apply_async(f2, res1a.get())
, this blocks, and doesn't even get submitted until the get()
returns.