I am trying to run a DAG of tasks using dask API for my specific application. To put it in a contrived example, I want tasks to pass out their success/failure flags and use those as the input to other tasks.
However, dask does not let me do __bool__
calls (a and b
) on delayed objects. But how is it different from bitwise boolean ops (i.e. a & b
).
Why is it implemented as not supported? and how hard is it to fix it locally?
I tried digging into the source code but I couldn't understand how a & b
successfully returns a sub-graph of ('and_', 'a', 'b'), but a and b
does not return something like ('__bool__1', 'a'), ('__bool__2', 'b'), ('and_', '__bool__1', '__bool__2').
I have provided the simplest source code to be able to re-produce the problem.
import dask
from time import sleep
@dask.delayed
def task(x, cond):
if not cond:
return False
sleep(x)
return True
def run_graph():
task1_done = task(2, True)
task2_done = task(1, True)
task3_done = task(1, task2_done)
all_done = task1_done and task3_done
return all_done
if __name__ == '__main__':
done = run_graph()
dask.compute(done)
if we replace the and operation with &, it works fine.
all_done = task1_done & task3_done
This might not be an issue here, but I want to use all()
and any()
built in functions for a list of delayed flags and those call __bool__
internally.