0

I am stuck in a strange place. I have a bunch of delayed function calls that I want to execute in a certain order. While executing in parallel is trivial:

res = client.compute([myfuncs])
res = client.gather(res)

I can't seem to find a way to execute them in sequence, in a non-blocking way.

Here's a minimal example:

import numpy as np
from time import sleep
from datetime import datetime

from dask import delayed
from dask.distributed import LocalCluster, Client


@delayed
def dosomething(name):
    res = {"name": name, "beg": datetime.now()}
    sleep(np.random.randint(10))
    res.update(rand=np.random.rand())
    res.update(end=datetime.now())
    return res


seq1 = [dosomething(name) for name in ["foo", "bar", "baz"]]
par1 = dosomething("whaat")
par2 = dosomething("ahem")
pipeline = [seq1, par1, par2]

Given the above example, I would like to run seq1, par1, and par2 in parallel, but the constituents of seq1: "foo", "bar", and "baz", in sequence.

suvayu
  • 4,271
  • 2
  • 29
  • 35

1 Answers1

1

You could definitely cheat and add an optional dependency to your function as follows:

@dask.delayed
def dosomething(name, *args):
     ...

So that you can make tasks depend on one-another, even thought you don't use one result in the next run of the function:

inputs = ["foo", "bar", "baz"]
seq1 = [dosomething(inputs[0])]
for bit in inputs[1:]:
    seq1.append(dosomething(bit, seq1[-1]))

Alternatively, you can read about the distributed scheduler's "futures" interface, whereby you can monitor the progress of tasks in real time.

mdurant
  • 27,272
  • 5
  • 45
  • 74
  • I tried the dummy argument approach, but in a wrapper (which of course didn't work), now that I see the solution, it seems obvious! :) – suvayu Feb 08 '19 at 05:46