1

I am trying to run a DAG of tasks using dask API for my specific application. To put it in a contrived example, I want tasks to pass out their success/failure flags and use those as the input to other tasks.

However, dask does not let me do __bool__ calls (a and b) on delayed objects. But how is it different from bitwise boolean ops (i.e. a & b).

Why is it implemented as not supported? and how hard is it to fix it locally?

I tried digging into the source code but I couldn't understand how a & b successfully returns a sub-graph of ('and_', 'a', 'b'), but a and b does not return something like ('__bool__1', 'a'), ('__bool__2', 'b'), ('and_', '__bool__1', '__bool__2').

I have provided the simplest source code to be able to re-produce the problem.

import dask
from time import sleep

@dask.delayed
def task(x, cond):
    if not cond:
        return False
    sleep(x)
    return True

def run_graph():
    task1_done = task(2, True)
    task2_done = task(1, True)
    task3_done = task(1, task2_done)

    all_done = task1_done and task3_done
    return all_done

if __name__ == '__main__':
    done = run_graph()
    dask.compute(done)

if we replace the and operation with &, it works fine.

all_done = task1_done & task3_done

This might not be an issue here, but I want to use all() and any() built in functions for a list of delayed flags and those call __bool__ internally.

1 Answers1

1

I don't know Dask personally in detail, but I suspect that it simply implements __and__ on it's objects. This does not convert the object to a boolean at all. This is unlike and, or etc, which convert the object to a boolean first.

This can be quickly tested with a small test class:

In [1]: class Test: 
    ...:     def __and__(self, other): 
    ...:         print("And called!") 
    ...:         return self 
    ...:     def __bool__(self): 
    ...:         print("Bool called!") 
    ...:         return True 
    ...:                                                                                                                                                                                                                             

In [2]: a = Test()                                                                                                                                                                                                                  

In [3]: b = Test()                                                                                                                                                                                                                  

In [4]: a & b                                                                                                                                                                                                                       
And called!
Out[4]: <__main__.Test at 0x7f5eb58f4eb8>

In [15]: a and  b                                                                                                                                                                                                                    
Bool called!
Out[5]: <__main__.Test at 0x7f5eb587e400>

Since Dask does delayed evaluation from my understanding, it is probable that __bool__ would have force immediate evaluation to work well, while __and__ can return a lazy object (since it returns an object of the same type, not a boolean).

Vorpal
  • 306
  • 1
  • 8
  • Thanks, I understand these, but is there any way to make the `__bool__` return lazy objects? plus I really didn't understand how in dask.Delay data structure all of these work out as it doesn't explicitly implement boolean or unary operations. but it works well with them. – Kourosh Hakhamaneshi Aug 25 '19 at 18:50
  • @KouroshHakhamaneshi Reading https://docs.python.org/3.7/reference/datamodel.html#object.__bool__ it says "should return False or True". It seems unlikely that it would work reliably with everything unless it returns a immediate value. Also, unlike with `__and__` etc (where you know what the "other" object is) you don't have any clue as to the context of the operation. Since I don't really know Dask itself, perhaps there is a more idiomatic way to do what you want to do in Dask? – Vorpal Aug 25 '19 at 19:06
  • yeah I guess my question now is how do I control the construction of the graph of tasks based on the results of some other earlier tasks in the graph. That's what I essentially wanna do. like how to not start a future task until the task is done successfully, or if it failed start another task, something like this. – Kourosh Hakhamaneshi Aug 25 '19 at 20:09