1

I'm looking for a better way of recursing over items that may have cyclic dependencies. Currently, I pass a list of already processed items along in order to not process them again, but that is probably not the best way to do it.

Here's what I'm currently doing:


        /// <summary>
    /// caching dependencies in order to increase performance
    /// </summary>
    private static readonly IDictionary<string, IEnumerable<OwnedItem>> dependencies
        = new Dictionary<string, IEnumerable<OwnedItem>>();

        /// <summary>
    /// recursively find OwnedItem this oi depends upon
    /// in order to correctly handle cyclic dependencies, already considered
    /// dependencies need to be supplied as well (can be null or empty)
    /// </summary>
    /// <param name="oi"></param>
    /// <param name="parentDeps"></param>
    /// <returns></returns>
    private static IEnumerable<OwnedItem> GetDependencies(
        OwnedItem oi,
        IEnumerable<OwnedItem> parentDeps)
    {
        if (null == oi)
        {
            return Enumerable.Empty<OwnedItem>();
        }
        if (dependencies.ContainsKey(oi.UniqueId))
        {
            return dependencies[oi.UniqueId];
        }
        var comparer = new TCObjectComparer<OwnedItem>();
        var result = new HashSet<OwnedItem>(comparer);
        result.Add(oi);
        result.UnionWith(parentDeps ?? Enumerable.Empty<OwnedItem>());
        foreach (var oi2 in oi.AllUsedOwnedItemsToBeIncluded.Except(
                result, comparer))
        {
            result.UnionWith(GetDependencies(oi2, result));
        }
        dependencies[oi.UniqueId] = result;
        return result;
    }

The items are of type 'OwnedItem' and keep a list (IEnumerable<OwnedItem>) of their direct dependencies in property AllUsedOwnedItemsToBeIncluded but basically this should apply whenever 'items' keep a list of 'items' where cyclic dependencies can occur. Using the dictionary just avoids doing the same calculation more than once; it is not essential. Also, only a single instance of TCObjectComparer is needed, but that is not essential, either. Any suggestions? I think there must exist some classic algorithm to handle this, but I can't find it.

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
Daniel
  • 321
  • 3
  • 10
  • Could you please format the code sample, it's very difficult to read – Nick Bailey Jun 03 '16 at 12:36
  • You have the right idea as far as I can see from the description. But the code is hard to follow, I don't know if you are doing it correctly. – weston Jun 03 '16 at 12:36
  • It would be awesome if you posted a [mcve]. I would love to be able to copy and run your code. Right now there are no definitions for `OwnedItem` & `TCObjectComparer`. And there is no sample data and expected ouput. – Enigmativity Jun 03 '16 at 13:09
  • @Nick Bailey - apart from the indented `///summary` lines it looks pretty much like it does in Visual Studio in my browser @weston - there's a bug in there somewhere, but I'm not going to chase it now that i have three better algorithms @Enigmativity - You're right, that would have been ideal. However, in this case it would have meant a lot of additional work for a small benefit. The unknown classes are part of the TOSCA API (a tool for automated functional testing); but as stated that is quite irrelevant. – Daniel Jun 03 '16 at 14:58

4 Answers4

1

What you are trying to do is basically going through all the nodes of a connected graph. Your AllUsedOwnedItemsToBeIncluded property is the list of nodes connected to your current node.

You can have a look here to find some graph traversal algorithms that might help.

You algorithm is one way of doing a graph traversal. You'll have to go through each node and keep a list of the visited nodes to not visit him twice.

One other algorithm which reduces the number of traversals can be:

list nodesToExplore;
list exploredNodes;
nodesToExplore.add(startNode);

for all node in nodesToExplore
{
    nodesToExplore.remove(node);
    exploredNodes.add(node);

    for all child in node.childs
    {
        if(child not in exploredNodes)
           nodesToExplore.add(child);
    }
}

When it ends, the exploredNodes will contain what you need. Using an hashset/dictionnary instead of a list will improve the performance

Vincent
  • 3,656
  • 1
  • 23
  • 32
  • Very elegant answer (and quick, too). I've used this simple algorithm and it is at least twice as fast as the horrible (and buggy) mess I started with. Thank you, Vincent. – Daniel Jun 03 '16 at 14:22
1

The algorithm can be extracted to a class, keeping your code tidier and get rid of that smelly static field.

private static IEnumerable<T> GetDependencies(T oi)
{
    return new FlattenedCircularTree<OwnedItem>(oi, o => o.AllUsedOwnedItemsToBeIncluded)
       .AllNodes();
}

And the general algorithm is implemented like so:

public sealed class FlattenedCircularTree<T>
{
    private readonly T _root;
    private readonly Func<T, IEnumerable<T>> _getChildren;
    private readonly HashSet<T> _visited = new HashSet<T>();
    private readonly List<T> _nodes = new List<T>();

    public FlattenedCircularTree(T root, Func<T, IEnumerable<T>> getChildren)
    {
        _root = root;
        _getChildren = getChildren;
    }

    public IEnumerable<T> AllNodes()
    {
        FindNodes(_root);
        return _nodes;
    }

    private void FindNodes(T current)
    {
        if (!_visited.Add(current))
            return;
        _nodes.Add(current);
        IEnumerable<T> children = _getChildren(current);
        if (children != null)
            foreach (T child in children)
                FindNodes(child);
    }
}
weston
  • 54,145
  • 21
  • 145
  • 203
  • Nice solution, +1; technically, there's one shortcoming: often graph is too big (it can be even infinite one) to be flatten. Enumerating is more flexible (e.g. do breadth first search and either find the node or stop the process on 100000th item) solution. However, the graph in the question is not that large, so the shortcoming mentioned is just academic one. – Dmitry Bychenko Jun 03 '16 at 13:32
  • @DmitryBychenko I like your yield return solution, much more flexible as you say. Going to steal the `if(set.Add)`, save double look up. – weston Jun 03 '16 at 13:42
  • Looks perfect; I'll pitch it against my implementation of Vincent's algorithm. Thanks, weston. – Daniel Jun 03 '16 at 14:30
1

You can implement something like this:

  public static partial class LinqGraph {
    public static IEnumerable<T> SelectBreadthFirst<T>(this IEnumerable<T> source, 
                                                       Func<T, IEnumerable<T>> children) {
      if (Object.ReferenceEquals(null, source))
        throw new ArgumentNullException(nameof(source));
      else if (Object.ReferenceEquals(null, children))
        throw new ArgumentNullException(nameof(children));

      HashSet<T> proceeded = new HashSet<T>();

      Queue<IEnumerable<T>> queue = new Queue<IEnumerable<T>>();

      queue.Enqueue(source);

      while (queue.Count > 0) {
        IEnumerable<T> src = queue.Dequeue();

        if (Object.ReferenceEquals(null, src))
          continue;

        foreach (var item in src) 
          if (proceeded.Add(item)) {
            yield return item;

            queue.Enqueue(children(item));
          }
      }
    }
  } 

And then use it

  var items = new OwnedItem[] {startItem} // root nodes
    //TODO: provide a rule of returning children on given parent
    .SelectBreadthFirst(parent => parent.AllUsedOwnedItemsToBeIncluded); 
Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
  • I think the children func should be `parent => parent.AllUsedOwnedItemsToBeIncluded` because `dependencies` isn't being updated any more right? – weston Jun 03 '16 at 13:21
  • @weston: thank you! I think you're quite right, since `dependencies` has been commented as *caching dependencies*. – Dmitry Bychenko Jun 03 '16 at 13:27
  • Another elegant and concise solution; I'll also check its performance. The graph in my case is indeed not updated during my operation. Thanks, Dmitry. – Daniel Jun 03 '16 at 14:34
0

Here's my implementation of Vincent's algorithm:


    private static readonly TCObjectComparer<OwnedItem> comparer
        = new TCObjectComparer<OwnedItem>();

    /// <summary>
    /// caching dependencies in order to increase performance
    /// </summary>
    private static readonly IDictionary<string, IEnumerable<OwnedItem>> dependencies
        = new Dictionary<string, IEnumerable<OwnedItem>>();

    /// <summary>
    /// recursively find OwnedItems this oi depends upon
    /// see http://stackoverflow.com/questions/37614469/how-to-recurse-over-items-having-cyclic-dependencies
    /// </summary>
    /// <param name="oi"></param>
    /// <returns></returns>
    private static IEnumerable<OwnedItem> GetDependencies(OwnedItem oi)
    {
        if (null == oi)
        {
            return Enumerable.Empty<OwnedItem>();
        }
        if (dependencies.ContainsKey(oi.UniqueId))
        {
            return dependencies[oi.UniqueId];
        }
        var resultExploredNodes = new HashSet<OwnedItem>(comparer);
        var nodesToExplore = new Queue<OwnedItem>();
        nodesToExplore.Enqueue(oi);
        while (nodesToExplore.Count > 0)
        {
            var node = nodesToExplore.Dequeue();
            resultExploredNodes.Add(node);
            // add nodes not already visited to nodesToExplore
            node.AllUsedOwnedItemsToBeIncluded
                .Except(resultExploredNodes, comparer)
                .ForEach(n => nodesToExplore.Enqueue(n));
        }
        dependencies[oi.UniqueId] = resultExploredNodes;
        return resultExploredNodes;
    }

Again, the caching is just there for performance, it is not essential for the algorithm.

Daniel
  • 321
  • 3
  • 10