3

I have come across an issue where dask scheduler get killed(though workers keep running) with memory error if large number of tasks are submitted in short period of time.

If it's possible to get current number of task on the cluster, then it's easy to control count of concurrent tasks submitted to the cluster.

NOTE: Tasks are being submitted to same scheduler from multiple clients.

Santosh Kumar
  • 761
  • 5
  • 28

1 Answers1

2

You can run arbitrary Python functions on the scheduler with the client.run_on_scheduler method.

Using this you can look at any of the scheduler state you like.

client.run_on_scheduler(lambda dask_scheduler: dask_scheduler.tasks)

Note that the scheduler can handle millions of tasks. If you're getting anywhere close to this then you should probably rethink how you're using Dask. For optimal performance you should choose tasks that take take hundreds of milliseconds or more.

MRocklin
  • 55,641
  • 23
  • 163
  • 235
  • Thanks MR for the reply and information you shared. – Santosh Kumar Sep 17 '17 at 00:52
  • @MR, Is there any recommended infrastructure configuration for the scheduler node. Currently I am running Scheduler on 56G RAM with two worker nodes with 56G RAM and 16 Core each. – Santosh Kumar Sep 17 '17 at 00:54
  • Generally the scheduler shouldn't need that much RAM. It's a single-threaded process that, under ideal conditions, doesn't handle that much data. – MRocklin Sep 17 '17 at 01:13