1

For the following toy example, I am attempting to parallelize some nested for loops using dask delayed/compute. Is there any way I can visualize the task graph for the following?

import time

from dask import compute, delayed


@delayed
def child(val):
    time.sleep(1)
    return val


@delayed
def p1(val):
    futs = []
    for i in range(5):
        futs += [child(val * i)]
    return compute(*futs)


@delayed
def p2(val):
    futs = []
    for i in range(10):
        futs += [p1(val * i)]
    return compute(*futs)


@delayed
def p3(val):
    futs = []
    for i in range(30):
        futs += [p2(val * i)]
    return futs


if __name__ == "__main__":
    f = p3(10)
    f.visualize()

For example, when I call the .visualize method on any of the delayed functions it returns just one level(node?) but none of the previous branches and functions. For instance p3(10).visualize() returns

p3 task graph

Perhaps I am using dask.delayed improperly here?

SultanOrazbayev
  • 14,900
  • 3
  • 16
  • 46
khubull
  • 31
  • 1
  • 3

2 Answers2

1

Building off Sultan's example above visualize(p3(10)) returns the following task graph

Instead if you modify the return to be a sum instead of a list:

import time

from dask import compute, delayed, visualize


@delayed
def child(val):
    time.sleep(1)
    return val


def p1(val):
    return sum([child(val * i) for i in range(2)])


def p2(val):
    return sum([p1(val * i) for i in range(3)])

def p3(val):
    return sum([p2(val * i) for i in range(4)])

It returns the following task graph

Perhaps my question should have been, what the blank boxes in the task graph represent?

khubull
  • 31
  • 1
  • 3
  • The rectangles represent the results of the functions ( and the functions are represented by circles), this is how Dask's task graphs are visualized everywhere. :) – pavithraes Jan 24 '22 at 17:35
0

dask.visualize will show the task DAG, however without evaluating the contents of a delayed task, dask will not know what to plot (since the results are delayed). Running compute within the delayed function doesn't resolve this, since this will be done only once the task itself is evaluated.

Referring to the best practices, you will want to avoid calling delayed within delayed.

The snippet below shows one way to modify the script:

import time

from dask import compute, delayed, visualize


@delayed
def child(val):
    time.sleep(1)
    return val


def p1(val):
    return [child(val * i) for i in range(2)]


def p2(val):
    return [p1(val * i) for i in range(3)]


def p3(val):
    return [p2(val * i) for i in range(4)]


if __name__ == "__main__":
    f = p3(10)
    visualize(f)
    # by default the dag will be saved into mydask.png
SultanOrazbayev
  • 14,900
  • 3
  • 16
  • 46
  • This is a good answer, Sultan! One thing I'd add, we'll need to call `compute(f)` in the end to compute the results. Also, I'm not seeing the task graph on `visualize(f)` (or `visualize(*f)`) -- any idea why? (I'll follow up if I figure it out!) – pavithraes Jan 21 '22 at 12:20
  • By default, the viz is saved into a `my_dask.png`. – SultanOrazbayev Jan 21 '22 at 12:25
  • Yeah, that file is empty for me, no graphs :/ – pavithraes Jan 21 '22 at 12:28
  • Hmm, that's odd... – SultanOrazbayev Jan 21 '22 at 12:29
  • Thanks Sultan, visualize(f) works for me in ipynb if I call it from another cell. However the task graph was different from what I expected, with all of the child functions in parallel with no branches or nodes. When I modified the toy example to return a sum rather than a list this showed the tree structure I was expecting. – khubull Jan 21 '22 at 16:48
  • 1
    Following up, my package installs were broken, it works perfectly now. :) – pavithraes Jan 24 '22 at 08:23