I’m trying to parallelize the following code (MCVE) by creating a task graph using dask.delayed
(or by implementing a computational graph myself):
os.chdir('./kitchen1')
write_dough() # writes file ./dough
write_topping() # writes file ./topping
write_pizza() # requires ./dough and ./topping; writes ./pizza
I see 2 difficulties:
write_dough
doesn't return anything.z=x+y
makes the dependency between variables clear; this doesn't. Dask doesn’t recommend relying on side effects. Is there an idiomatic solution?os.chdir
. How do I incorporate it into a computation graph?- I am not concerned about parallelizing file IO, performance, etc.
Here’s my current solution. It adds complexity, and './kitchen1'
is everywhere, which is ugly. What would an elegant solution be?
write_dough, write_topping, write_pizza = map(dask.delayed, (write_dough, write_topping, write_pizza))
dough = write_dough('./kitchen1')
topping = write_topping('./kitchen1')
pizza = write_pizza(dough, topping, './kitchen1')