There is a "killer" feature in the requirements that prevents an easy solution. There are not one but two directions of constraints:
- task A must be performed after task B => A "depends" on B
- task A must be performed before task B => B "depends" on A
In the real world situation I'm facing these dependencies are even more complicated but that doesn't matter: All can be broken down to an analog situation as just described.
Now the algorithm:
Step 1: Compile all these constraints of each task into a set of single direction dependencies for each task. Single direction dependencies can then be handled real easily. The first idea of building a graph first, performing cycle detection second and then performing topological sorting (thanks for the term, Dimitry) can then be abandoned altogether. So each item ends up with a set of dependencies:
- If task A must be performed after task B we store "B" in the dependency set of A.
- If task A must be performed before task B we store "A" in the dependency set of B.
While doing this we can even do sanity checks for these dependencies. If there's something wrong in the constraint specifications we can easily detect this in this step.
Step 2: Now we have a very simple problem as there are only one way dependencies. These can be considered to be preconditions: A task can only be performed if all preconditions are met. Now we can proceed as follows:
pack all tasks into a list named /notYetProcessed/
create empty list /result/
while there are still tasks to process in /notYetProcessed/ do:
create empty list /remaining/
for all tasks /t/ in /notYetProcessed/ do:
if all dependencies are met for /t/ then do:
add /t/ to /result/
else do:
add /t/ to /remaining/
if /length(notYetProcessed)/ matches /length(remaining)/ then do:
terminate with error
/notYetProcessed/ = /remaining/
After the outer while condition terminated result
will contain a list of tasks to process in an order that follows all constraints as defined in the beginning.
If the above algorithm terminates with an error that means:
* no task could be processed in this loop
* which means: some dependencies could not be resolved
* which means: a task dependency cycle exists involving exactly the remaining tasks
Step 3: Now process all tasks one by one as stored in result
, one by one and you'll be fine.
As you can see this can be done without even building a special graph representation of the data. Topological sorting is performed "directly" on the data by accepting the first solution (= first variant of all possible sort orders) we can get our hands on.
There might be some even more efficient algorithms to solve this (after the first dependency compilation has been performed) I might not be aware of. If so I'd be happy to learn about them!