0

I’m trying to parallelize the following code (MCVE) by creating a task graph using dask.delayed (or by implementing a computational graph myself):

os.chdir('./kitchen1')
write_dough()   # writes file ./dough
write_topping() # writes file ./topping
write_pizza()   # requires ./dough and ./topping; writes ./pizza

I see 2 difficulties:

  1. write_dough doesn't return anything. z=x+y makes the dependency between variables clear; this doesn't. Dask doesn’t recommend relying on side effects. Is there an idiomatic solution?
  2. os.chdir. How do I incorporate it into a computation graph?
  3. I am not concerned about parallelizing file IO, performance, etc.

Here’s my current solution. It adds complexity, and './kitchen1' is everywhere, which is ugly. What would an elegant solution be?

write_dough, write_topping, write_pizza = map(dask.delayed, (write_dough, write_topping, write_pizza))

dough = write_dough('./kitchen1')
topping = write_topping('./kitchen1')
pizza = write_pizza(dough, topping, './kitchen1')
stalostan
  • 5
  • 1
  • 4

1 Answers1

0

I would recommend your current approach of passing through dependencies explicitly.

MRocklin
  • 55,641
  • 23
  • 163
  • 235