I've read at the dask distributed
documentation that:
Worker and Scheduler nodes operate concurrently. They serve several overlapping requests and perform several overlapping computations at the same time without blocking.
I've always thought single-thread concurrent programming is best suited for I/O expensive, not CPU-bound jobs. However I expect many dask tasks (e.g. dask.pandas
, dask.array
) to be CPU intensive.
Does distributed only use Tornado for client/server communication, with separate processes/threads to run the dask tasks? Actually dask-worker
has --nprocs
and --nthreads
arguments so I expect this to be the case.
How do concurrency with Tornado coroutines and more common processes/threads processing each dask task live together in distributed?