We have a dask compute graph (quite custom so we use dask delayed instead of collections). I've read in the docs that current scheduling policy is LIFO so that a worker process has big chances to get the data it has just computed for further steps down the graph. But as far as I understood task computation results are still (de)serialized to hard drive in even in this case.
So the question is how much performance gain would I get trying to keep as little tasks as possible down a single path of independent computations in a graph:
A) many small "map" tasks along each path
t --> t --> t -->...
some reduce stage
t --> t --> t -->...
B) one huge "map" task along for each path
T ->
some reduce stage
T ->
Thank you!