5

I've read at the dask distributed documentation that:

Worker and Scheduler nodes operate concurrently. They serve several overlapping requests and perform several overlapping computations at the same time without blocking.

I've always thought single-thread concurrent programming is best suited for I/O expensive, not CPU-bound jobs. However I expect many dask tasks (e.g. dask.pandas, dask.array) to be CPU intensive.

Does distributed only use Tornado for client/server communication, with separate processes/threads to run the dask tasks? Actually dask-worker has --nprocs and --nthreads arguments so I expect this to be the case.

How do concurrency with Tornado coroutines and more common processes/threads processing each dask task live together in distributed?

dukebody
  • 7,025
  • 3
  • 36
  • 61

1 Answers1

4

You are correct.

Each distributed.Worker object contains a concurrent.futures.ThreadPoolExecutor with multiple threads. Tasks are run on this ThreadPoolExecutor for parallel performance. All communication and coordination tasks are managed by the Tornado IOLoop.

Generally this solution allows computation to happen separately from communication and administration. This allows parallel computing within a worker and allows workers to respond to server requests even while computing tasks.

Command line options

When you make the following call:

dask-worker --nprocs N --nthreads T

It starts N separate distributed.Worker objects in separate Python processes. Each of these workers has a ThreadPoolExecutor with T threads.

MRocklin
  • 55,641
  • 23
  • 163
  • 235
  • Thanks! Is there some place in the docs where this is explained, or do you believe that current docs could benefit from some extra info about this? – dukebody Oct 04 '16 at 21:24
  • Current best location for information like this lives here: http://distributed.readthedocs.io/en/latest/worker.html I'm sure it could be improved. It's tricky to balance between giving a lot of technical detail and making things understandable for common case readers. I wouldn't want to jump into Tornado immediately (most users don't care) but perhaps a section at the end or comments tastefully sprinkled throughout? – MRocklin Oct 05 '16 at 18:02