2

I know its a very specific to the environment running the code but given dask calculates its execution plan in advance into a DAG is there a way to understand how long that execution should take?

The progress bar is a great help once execution is running but would it be possible to understand beforehand how long to expect a series of operations should take?

mobcdi
  • 1,532
  • 2
  • 28
  • 49

1 Answers1

4

Short Answer

No.

Explanation

The Dask scheduler just executes Python functions. It doesn't think about where they came from or the broader context of what they represent (for example, a dataframe join or matrix multiply). From its perspective it has just been asked to execute a graph of opaque function calls. This generality is a weakness (hard to perform high level analysis) but also Dask's main strength, because it can be applied to a broad variety of problems outside of any particular domain or specialty.

The distributed scheduler does maintain an exponentially weighted average of each function's duration which could be used to create an estimate of a task graph. I would search the scheduler.py file for task_duration if you're interested in building this.

MRocklin
  • 55,641
  • 23
  • 163
  • 235