Assume I have activities or tasks that
- should all be executed.
- has no predetermined time, but some activities take longer than others
- are not CPU bound and subject to network/IO latency and transient errors
- have dependencies on others; in the example below
C
can only execute onceA
andB
was complete.
What is the most appropriate algorithm to use to schedule activities to minimize the total time to complete all tasks? My current approach is less than optimal, because (in the example below) the way G
is scheduled adds an additional delay of 20s to execution. The answer to this question got me down the path where I am.
Here's an example (if it was a DSL)
Task A
{
Estimation: 10s;
}
Task B
{
Estimation: 10s;
}
Task C
{
Estimation: 10s;
DependsOn A, B;
}
Task D
{
Estimation: 10s;
DependsOn C;
}
Task E
{
Estimation: 10s;
DependsOn C;
}
Task F
{
Estimation: 10s;
DependsOn E, D;
}
Task G
{
Estimation: 30s;
DependsOn A, B;
}
Here's what I did (in C#)
Created a graph (Directed acyclic graph) of activities.
The following code snippet if from a TaskManager
class.
private static Graph<ITask> CreateGraph(IEnumerable<ITask> tasks)
{
if (tasks == null)
throw new ArgumentNullException(nameof(tasks));
var nameMap = tasks.ToDictionary(task => task.Id);
var graph = new Graph<ITask>(nameMap.Values);
foreach (var task in nameMap.Values)
{
foreach (var depdendantTaskName in task.DependsOn)
{
var from = nameMap[depdendantTaskName];
var to = task;
graph.AddDependency(from, to);
}
}
return graph;
}
Perform a Topological Sort
public static Node<T>[] Sort<T>(this Graph<T> graph) where T : IComparable
{
var stack = new Stack<Node<T>>();
var visited = new HashSet<Node<T>>();
foreach (var node in graph)
{
if (!visited.Contains(node))
{
visited.Add(node);
InternalSort(node, stack, visited);
}
}
return stack.ToArray();
}
private static void InternalSort<T>(Node<T> node, Stack<Node<T>> stack, ISet<Node<T>> visited)
where T : IComparable
{
var dependants = node.Dependants;
foreach (var dependant in dependants)
{
if (!visited.Contains(dependant))
{
visited.Add(dependant);
InternalSort(dependant, stack, visited);
}
}
stack.Push(node);
}
This gave me something like [F,E,D,C,G,B,A]. If I used dependencies instead of dependents, it would have been [A,B,C,G,D,E,F].
Assign a Level to Each Node
Now that I have an array of sorted nodes, the next is to update the level property of each node.
public static void Level<T>(this IEnumerable<Node<T>> nodes) where T : IComparable
{
foreach (var sortedTask in nodes)
{
sortedTask.Level = CalculateLevel(sortedTask.Dependencies);
}
}
public static int CalculateLevel<T>(ICollection<Node<T>> nodes) where T : IComparable
{
if (nodes.Count <= 0) return 1;
return nodes.Max(n => n.Level) + 1;
}
This gave me something like [F:1,G:1,E:2,D:2,C:3,B:4,A:4] where the letter is the activity name and the number is the level. If I did this in the reverse, it would have looked something like [F:4,E:3,D:3,G:2,C:2,B:1,A:1].
Group tasks
public static SortedDictionary<int, ISet<T>> Group<T>(this IEnumerable<Node<T>> nodes) where T : IComparable
{
var taskGroups = new SortedDictionary<int, ISet<T>>();
foreach (var sortedNode in nodes)
{
var key = sortedNode.Level;
if (!taskGroups.ContainsKey(key))
{
taskGroups[key] = new SortedSet<T>();
}
taskGroups[key].Add(sortedNode.Value);
}
return taskGroups;
}
Execute Tasks
The following goes through each "level" and executes the tasks.
private async Task ExecuteAsync(IDictionary<int, ISet<ITask>> groups, ITaskContext context,
CancellationToken cancellationToken)
{
var keys = groups.Keys.OrderByDescending(i => i);
foreach (var key in keys)
{
var tasks = groups[key];
await Task.WhenAll(tasks.Select(task => task.ExecuteAsync(context, cancellationToken)));
}
}
The OrderByDescending
was necessary if tasks were sorted from most dependent to least dependent node (F
first, A
or B
last)
Problem
While this approach still executes faster than a sequential approach, no matter how I approach it, something is always waiting on G
to complete. if G
is grouped with C
, then D
and E
will be delayed by 20s even though they are not dependent on G
.
If I reverse the sorting (and adjust the code), the G
only starts executing when F
starts executing.