It is possible to save most of the dashboard information in a file with performance_report
(see docs):
from dask.distributed import Client, performance_report
import time
client = Client()
def f(x):
time.sleep(x)
return x
with performance_report():
futures = client.map(f, range(5))
results = client.gather(futures)
Note that this will save the final file only once the computation has stopped (either because it was completed or due to an error/interruption).
If you want something that will save information at specific steps/intervals, the simplest solution would be to split computations into chunks and save the information on each chunk. Note that such hacky-saves will reduce the efficiency of parallelisation, since some of the workers might end up idle during frequent .gather
or .compute
operations.
Another option is to periodically dump the contents of client.get_task_stream()
, which returns a tuple containing dictionaries for each task on the scheduler. This can be done either after completion of each future with as_completed
or with some periodicity (e.g. every n
seconds within a while
loop). I can't think of a general solution, but it should be possible.
You might also find the links in this answer useful.