0

Until now, I've used dask with get and a dictionary to define the dependencies graph of my tasks. But it means that I have to define all my graph since the beginning, and now I want to add from time to time new tasks (with dependencies on old tasks).

I've read about the distributed package, and it looks appropriate. I've seen two possible options to define my graph:

  1. Using delayed, and define the dependencies between each task:

    t1 = delayed(f)()
    t2 = delayed(g1)(t1)
    t3 = delayed(g2)(t1)
    dask.compute([t2, t3])
    
  2. Using map/submit, and do something like:

    t1 = client.submit(f)
    t2 = client.map(g1, [t1])[0]
    t3 = client.map(g2, [t1])[0]
    

What do you think is more appropriate? Thanks!

1 Answers1

0

If your goal is to change your computation over time then you should use Dask's concurrent.futures API described here:

http://dask.pydata.org/en/latest/futures.html

MRocklin
  • 55,641
  • 23
  • 163
  • 235
  • To be clearer - I have a graph (with ~500 nodes), and after a while I want to add another graph (where some tasks are dependent on the prev tasks). What is the best way adding each graph? Is it with futures? What is the way to declare that one future is dependent on another? – user1769471 Jun 15 '18 at 22:10
  • Yes, futures can do this easily. I recommend reading through the documentation pointed to above and then asking additional questions if things are still unclear. – MRocklin Jun 16 '18 at 14:01
  • I've read about futures, and now I understand how I can do it with them. But I have a further question: in terms of scheduling/performance, what is better? Defining all tasks as futures (and using client.submit), or using delayed to define the graph? If I understand correctly, delayed is lazy, so the entire graph (with ~500 nodes) will be built before the computation, and client.submit is not lazy. – user1769471 Jun 17 '18 at 11:33
  • 1
    Neither is better. It is as you say, one is lazy and one is immediate. Otherwise they are both mostly the same. – MRocklin Jun 18 '18 at 15:27
  • Thanks a lot! I'll use client.submit, as it a little bit easier – user1769471 Jun 20 '18 at 07:28