1

Local dask allows using process scheduler. Workers in dask distributed are using ThreadPoolExecutor to compute tasks. Is it possible to replace ThreadPoolExecutor with ProcessPoolExecutor in dask distributed? Thanks.

Vladyslav Moisieienkov
  • 4,118
  • 4
  • 25
  • 32

1 Answers1

1

The distributed scheduler allows you to work with any number of processes, via any of the deployment options. Each of these can have one or more threads. Thus, you have the flexibility to choose your favourite mix of threads and processes as you see fit.

The simplest expression of this is with the LocalCluster (same as Client() by default):

cluster = LocalCluster(n_workers=W, threads_per_worker=T, processes=True)

makes W workers with T threads each (which can be 1).

As things stand, the implementation of workers uses a thread pool internally, and you cannot swap in a process pool in its place.

mdurant
  • 27,272
  • 5
  • 45
  • 74
  • Thanks for your answer. So, with this setup, my tasks will be launched in the process pool, not in the thread pool? – Vladyslav Moisieienkov Jan 11 '19 at 09:03
  • Just to add. The idea is to use ProcessPoolExecutor instead of ThreadPoolExecutor. – Vladyslav Moisieienkov Jan 11 '19 at 10:04
  • I am saying, you do not need to bother yourself with that. Why do you want to change such a low-level implementation detail? – mdurant Jan 11 '19 at 13:46
  • It's related to https://stackoverflow.com/questions/54077457/send-sigterm-to-the-running-task-dask-distributed?noredirect=1#comment95092441_54077457 . ProcessPool is more suitable. But I have already understood that it's not possible to easily replace ThreadPool with ProcessPool. – Vladyslav Moisieienkov Jan 11 '19 at 14:27
  • "ProcessPool is more suitable" - really, I wouldn't mess with this. You already have multiple processes, this sounds like a recipe for a cascade. – mdurant Jan 11 '19 at 14:31
  • I agree that ProcessPool is generally not good solution, but, for my purpose I needed it. Could, you just add to your answer that ThreadPool cannot be replaced with ProcessPool for running tasks, I will accept the answer and we can close the question. Thanks. – Vladyslav Moisieienkov Jan 15 '19 at 10:39