1

Is it possible to have multiple clients in dask? For instance, can I have multiple threads running with one client per thread, so that when one thread blocks, the others can continue? In this case, each client would have separate task graphs that don't depend on each other.

As a followup question, if this is possible, then how can I specify where to run a specific task? When I do dd.read_csv, then call compute, how do I know which client and its associated scheduler / workers is executing this?

jrdzha
  • 161
  • 2
  • 12

1 Answers1

4

Is it possible to have multiple clients in dask

yes, this is possible, you could, for instance, be running computations on one cluster and other computations on another, simultaneously

can I have multiple threads running with one client per thread

It is not clients that run your work, but workers, so I'm not sure what you are asking.

when one thread blocks, the others can continue

Clients are largely async, and there are few operations that should block, and it's up to you when you call them.

when call compute, how do I know which client and its associated scheduler / workers is executing this

thing.compute() will use the default client, which will be the most recently-created one. The function dask.distributed.get_client() would fetch the right one for you.

To pick which to use, you can use either of these:

fut = client.compute(thing) 
fut.result() or client.gather(fut)

with client:
    thing.compute()
mdurant
  • 27,272
  • 5
  • 45
  • 74
  • Thanks this answers my question. To clarify, I didn't mean that the blocking was coming from the clients. I'm running a REST server with a threadpool, and threads might block on disk access / just spend a significant amount of time on a task. – jrdzha Jun 02 '20 at 21:38
  • 1
    Hi, sorry to reopen this, I'm running into a different issue now. Since I have multiple clients running, while using the client.compute and client.gather to specify which client to run tasks on, some tasks that don't have a specified client (for instance dd.read_parquet) default to Dask's default client, which is the most recently instantiated one. This gives me an error when I try to specify another client to do additional processing... is there a way around this? Thanks! – jrdzha Jun 15 '20 at 04:15
  • If you can make a small reproducer in which something gets computed on the wrong client, you should make it into a gihub issue to the distributed repo. – mdurant Jun 15 '20 at 13:09