Multiprocessing with python and dependencies

Question

I have a program with 10 functions, and I want to take profit of concurrency to make it more efficient. I have extracted the dependencies between functions, which are the following:

f1 <- f2,f3,f4,f5
f2 <- f6
f3 <- f7,f8,f9
f4 <- f10
f5 <- f10
f8 <- f10
f9 <- f10

Can I achieve this with the multiprocessing library?

Can anybody give me an snippet of code to start from it?

My question is quite similar to this one, but I would like to get it using the build in Python libraries.

Parallel Tasking Concurrency with Dependencies on Python like GNU Make

Thanks,

An answer for the same problem but in java: https://stackoverflow.com/q/63354899/1925388 — Anmol Singh Jaggi, May 08 '21 at 15:39

Kyle Dewey · Answer 1 · 2013-09-28T21:28:07.883

I'm not 100% sure what you're asking for, but any task of this sort can be approached in the following way. Given the dependencies you provided, a dependency graph can be constructed using topological sort. Using this information, it is possible to immediately determine which nodes have no dependencies, i.e. those without incoming edges. These nodes can all be processed in parallel. Once a node has been processed, all descendent nodes can be marked that the given dependency has been satisfied. Once all dependencies for a node have been satisfied, then the node can be run.

In your case, running a node means performing a function call. As such, instead of simply marking that the dependency has been satisfied, you may want to store the result of the function call.

As an aside, without seeing the functions, this may or may not actually yield any performance benefit. Generally, this sort of parallelism is too fine-grained; more time is spent performing parallel coordination as opposed to actually running work in parallel.

---EDIT---

I wrote a small Scala library that I believe performs what you want. Unfortunately, a similar elegant solution isn't possible in CPython, since it doesn't support proper multithreading. It's still possible but clumsy; the whole framework would need to be written in a master-slave fashion. This also limits parallelism, as the master acts as a bottleneck for the processing of new tasks.

Thanks Kyle for the answer and sorry for not being clear. I am 100% agree with your answer. But my question is: once I have the dependency graph, how can I program well taking profit of concurrence? One example: 4 tasks: t3 depends on t1 and t4 depends on t2. How would you write the code in a proper way? If I write: ...t1.start(), t2.start(), t1.join(), t3.start(), t2.join(), t4.start()... then if t2 is faster than t1 the code is not efficient. Probably it doesn't exist, but a high level tool that performs the task with only the functions and the dep, graph as a params would be fantastic! — Miquel, Sep 21 '13 at 19:17
I'm unfortunately not aware of anything that works quite like this, though I've started to work on a toy project to implement this. — Kyle Dewey, Sep 22 '13 at 17:58

Multiprocessing with python and dependencies

1 Answers1