I'm not 100% sure what you're asking for, but any task of this sort can be approached in the following way. Given the dependencies you provided, a dependency graph can be constructed using topological sort. Using this information, it is possible to immediately determine which nodes have no dependencies, i.e. those without incoming edges. These nodes can all be processed in parallel. Once a node has been processed, all descendent nodes can be marked that the given dependency has been satisfied. Once all dependencies for a node have been satisfied, then the node can be run.
In your case, running a node means performing a function call. As such, instead of simply marking that the dependency has been satisfied, you may want to store the result of the function call.
As an aside, without seeing the functions, this may or may not actually yield any performance benefit. Generally, this sort of parallelism is too fine-grained; more time is spent performing parallel coordination as opposed to actually running work in parallel.
---EDIT---
I wrote a small Scala library that I believe performs what you want. Unfortunately, a similar elegant solution isn't possible in CPython, since it doesn't support proper multithreading. It's still possible but clumsy; the whole framework would need to be written in a master-slave fashion. This also limits parallelism, as the master acts as a bottleneck for the processing of new tasks.