First, read this question: Repeated task execution using the distributed Dask scheduler
Now, when Dask decides to rerun a task due to worker stealing or a task failing (as a result of memory limits per process for example), which task result gets passed to the next node of the DAG? We are using nested tasks, e.g.
@dask.delayed
def add(n):
return n+1
t_a = add(1)
t_b = add(t_a)
the_output = add(add(add(t_b)))
So if one of these tasks fails, or gets stolen, and is run twice, which result gets passed to the next node in the DAG?
Further background for those interested:
The reason this has come up is that our task writes to a database. If it runs twice, we get an integrity error because it is trying to insert the same record twice (constrained on id
and version
in combination). The current plan is to make the task idempotent by catching the integrity error in the task but I still don't understand how Dask "chooses" a result.