Algorithm to order tasks with dependencies

Question

In a private open source project I encounter the following problem:

There are a variety of tasks to perform. Some of these tasks will have annotations that

they must be performed after one or more specific other tasks
they must be performed before one or more specific other tasks

I'm looking for an easy algorithm of how to build a directed graph out of that information that then could be used for cycle detection and performing all task tasks in an order that allows respects all these conditions about their order of execution.

Q: What would be an efficient, good way to build such a graph?

Thank you for your help.

Note: It is clear we will require two extra nodes in the graph: A starting node and an end node. Let's name them START and END. It is clear that a node without dependency must end up in a construct such as START -> A -> END. But it is less clear to me to find a good way in how to end up in a START -> B -> C -> END sequence given that B must be followed by C while not getting an edge from B to END and no edge from START to C.

In general there is not *one* directed graph for this. Multiple incompatible solutions exist and should all be considered when trying to find all valid task sequences. So if that is the purpose, you cannot depend on one single directed graph (in general). — trincot, Jan 28 '18 at 19:32
Hm. I just need ONE task sequence, so I'm fine with the first solution that respects all constraints. — Regis May, Jan 28 '18 at 21:00

score 2 · Answer 1 · answered Jan 29 '18 at 08:14

There is a "killer" feature in the requirements that prevents an easy solution. There are not one but two directions of constraints:

task A must be performed after task B => A "depends" on B
task A must be performed before task B => B "depends" on A

In the real world situation I'm facing these dependencies are even more complicated but that doesn't matter: All can be broken down to an analog situation as just described.

Now the algorithm:

Step 1: Compile all these constraints of each task into a set of single direction dependencies for each task. Single direction dependencies can then be handled real easily. The first idea of building a graph first, performing cycle detection second and then performing topological sorting (thanks for the term, Dimitry) can then be abandoned altogether. So each item ends up with a set of dependencies:

If task A must be performed after task B we store "B" in the dependency set of A.
If task A must be performed before task B we store "A" in the dependency set of B.

While doing this we can even do sanity checks for these dependencies. If there's something wrong in the constraint specifications we can easily detect this in this step.

Step 2: Now we have a very simple problem as there are only one way dependencies. These can be considered to be preconditions: A task can only be performed if all preconditions are met. Now we can proceed as follows:

pack all tasks into a list named /notYetProcessed/
create empty list /result/
while there are still tasks to process in /notYetProcessed/ do:
    create empty list /remaining/
    for all tasks /t/ in /notYetProcessed/ do:
        if all dependencies are met for /t/ then do:
            add /t/ to /result/
        else do:
            add /t/ to /remaining/
    if /length(notYetProcessed)/ matches /length(remaining)/ then do:
        terminate with error
    /notYetProcessed/ = /remaining/

After the outer while condition terminated result will contain a list of tasks to process in an order that follows all constraints as defined in the beginning.

If the above algorithm terminates with an error that means: * no task could be processed in this loop * which means: some dependencies could not be resolved * which means: a task dependency cycle exists involving exactly the remaining tasks

Step 3: Now process all tasks one by one as stored in result, one by one and you'll be fine.

As you can see this can be done without even building a special graph representation of the data. Topological sorting is performed "directly" on the data by accepting the first solution (= first variant of all possible sort orders) we can get our hands on.

There might be some even more efficient algorithms to solve this (after the first dependency compilation has been performed) I might not be aware of. If so I'd be happy to learn about them!

Imran · Answer 2 · 2018-01-29T04:42:55.073

-1

You can start with any order and then walk that order, swapping any elements that are out of order. You can repeat this until no more tasks are out of order.

I would use a hash table (or simply an array) for quick look-ups to determine if tasks are out of order.

Pseudocode:

class Task:
    id: int # serial id of task, ie 1..n
    not_before: array[int] # ids of tasks this task cannot precede
    not_after: array[int] # ids of tasks this task cannot come after

tasks: array[Task] = ... # tasks in order of ids

order: array[int] = [1,2,...,n] # task ids in initial order

positions: array[int] = ... # positions[i] is the index of task i in order array

def swap_tasks(i, j):
    swap(order[positions[i]], order[positions[j]])
    swap(positions[i], positions[j])

repeat:
    made_swap = False
    for i in 0..n: # loop over task ids
        for j in tasks[i].not_before:
            if positions[i] < positions[j]:
                swap_tasks(i, j)
                made_swap = True
        for j in tasks[i].not_after:
            if positions[i] > positions[j]:
                swap_tasks(i, j)
                made_swap = True
    if made_swap == False:
        break

For n tasks and O(k) constraints per tasks this should run in O(n²log(k)), since a task can move at most n times (since it can't go back past the position of the last task it was swapped with).

I thought about processing tasks in order and inserting not_after tasks, followed by task, followed by not_before tasks, and then inserting (or moving if they already appear) subsequent tasks to satisfy constraints, but this doesn't really seem to help since not_before and not_after tasks can be out of order with respect to each other, so we still need lots of swapping.

edited Jan 29 '18 at 04:42

answered Jan 29 '18 at 04:37

Imran

12,950
8
64
79

Yes, I thought about something like that as well in the first place. But I could not figure out how to detect cycles in such an approach so I decided against this. Nevertheless thank you for sharing this interesting idea! – Regis May Jan 29 '18 at 08:17
By cycles do you mean that there might be conflicting constraints? ie A must be before B but B must be before A? Yes, I am assuming that all constraints are consistent. – Imran Jan 29 '18 at 08:32
1

Yes. You some wrong specifications could be erroneous as definitions originate from humans: F.e. typos and such things. See the description: There I already talk about detecting that would be one activity required after a graph has been built (if such an explicit intermediate step of building a graph is performed). – Regis May Jan 29 '18 at 11:00
OK, a simple solution is to store constraints you have seen so far in a hash table, and then if you get a new constraint "x not before y" then check that you haven't seen "y not before x" - if so then ignore the constraint. – Imran Jan 29 '18 at 11:15
Hm, that's not a bad idea! Keeping constraints and checking them for contradictions! This way it is a logical problem. I will remember that for the future. But don't forget: Only direct contradictory constraints can be detected that way. Cyclic or indirect contradictory constraints would still not be detectable so easily. Nevertheless it's an interesting idea. – Regis May Jan 29 '18 at 12:44
OK, yes, you could use a Disjoint Set Forest to detect non-trivial cycles. – Imran Jan 29 '18 at 16:02

Algorithm to order tasks with dependencies

2 Answers2