My Dask computation is slow. When I look at the status page of the diagnostics dashboard I see that most of the time is spent in disk-read-*
and disk-write-*
tasks.
What does this mean?
How do I diagnose this issue?
When Dask workers start to run out of memory they write extra data to disk. This is recorded in the status page as a disk-write-
task. When that data is needed again it is read from disk and a disk-read-
task is shown on the status page. You might confirm this by looking at the upper left plot that shows memory use per worker, or by looking at the solid portion of the progress bars that show the number of tasks of each particular type that are still in memory.
Ways you can address this:
persist
a lot of data