6

Background:

I was working on a project were I needed to write some rules for text processing. After working on this project for a couple of days and implementing some rules, I realized I needed to determine the order of the rules. No problems, we have topological sorting to help. But then I realized that I can't expect the graph to be always full. So I came up with this idea, that given a single rule with a set of dependencies (or a single dependency) I need to check the dependencies of the dependencies. Sounds familiar? Yes. This subject is very similar to Depth-first-searching of a graph.
I am not a mathematician, nor did I study C.S. Hence, Graph Theory is a new field for me. Nevertheless, I implemented something (see below) which works (inefficiently, I suspect).

The code:

This is my search and yield algorithm. If you run it on the examples below, you will see it visits some nodes more then once. Hence, the speculated inefficiency.
A word about the input. The rules I wrote are basically python classes, which have a class property depends. I was criticized for not using inspect.getmro- But this would complicate thing terribly because the class would need to inherit from each other (See example here)

def _yield_name_dep(rules_deps):
    global recursion_counter
    recursion_counter = recursion_counter +1 
    # yield all rules by their named and dependencies
    for rule, dep in rules_deps.items():
        if not dep:
            yield rule, dep
            continue
        else:
            yield rule, dep
            for ii in dep:
                i = getattr(rules, ii)
                instance = i()
                if instance.depends:
                    new_dep={str(instance): instance.depends}
                    for dep in _yield_name_dep(new_dep):
                        yield dep    
                else:
                    yield str(instance), instance.depends

OK, now that you stared in the code, here is some input you can test:

demo_class_content ="""
class A(object):
    depends = ('B')
    def __str__(self):
        return self.__class__.__name__

class B(object):
    depends = ('C','F')
    def __str__(self):
        return self.__class__.__name__

class C(object):
    depends = ('D', 'E')
    def __str__(self):
        return self.__class__.__name__

class D(object):
    depends = None
    def __str__(self):
        return self.__class__.__name__   

class F(object):
    depends = ('E')
    def __str__(self):
        return self.__class__.__name__

class E(object):
    depends = None  
    def __str__(self):
        return self.__class__.__name__
"""       

with open('demo_classes.py', 'w') as clsdemo:
    clsdemo.write(demo_class_content)

import demo_classes as rules

rule_start={'A': ('B')}

def _yield_name_dep(rules_deps):
    # yield all rules by their named and dependencies
    for rule, dep in rules_deps.items():
        if not dep:
            yield rule, dep
            continue
        else:
            yield rule, dep
            for ii in dep:
                i = getattr(rules, ii)
                instance = i()
                if instance.depends:
                    new_dep={str(instance): instance.depends}
                    for dep in _yield_name_dep(new_dep):
                        yield dep    
                else:
                    yield str(instance), instance.depends

if __name__ == '__main__':
    # this is yielding nodes visited multiple times, 
    # list(_yield_name_dep(rule_start))
    # hence, my work around was to use set() ...
    rule_dependencies = list(set(_yield_name_dep(rule_start)))
    print rule_dependencies

The questions:

  • I tried classifying my work, and I think what I did is similar to DFS. Can you really classify it like this?
  • How can I improve this function to skip visited nodes, and still use generators ?

update:

Just to save you the trouble running the code, the output of the above function is:

>>> print list(_yield_name_dep(rule_wd))
[('A', 'B'), ('B', ('C', 'F')), ('C', ('D', 'E')), ('D', None), ('E', None), ('F', 'E'), ('E', None)]
>>> print list(set(_yield_name_dep(rule_wd)))
[('B', ('C', 'F')), ('E', None), ('D', None), ('F', 'E'), ('C', ('D', 'E')), ('A', 'B')]

In the mean while I came up with a better solution, the question above still remain. So feel free to criticize my solution:

visited = []
def _yield_name_dep_wvisited(rules_deps, visited):
    # yield all rules by their name and dependencies
    for rule, dep in rules_deps.items():
        if not dep and rule not in visited:
            yield rule, dep
            visited.append(rule)
            continue
        elif rule not in visited:
            yield rule, dep
            visited.append(rule)
            for ii in dep:
                i = getattr(grules, ii)
                instance = i()
                if instance.depends:
                    new_dep={str(instance): instance.depends}
                    for dep in _yield_name_dep_wvisited(new_dep, visited):
                        if dep not in visited:
                            yield dep    
                    
                elif str(instance) not in visited:
                    visited.append(str(instance))
                    yield str(instance), instance.depends

The output of the above is:

>>>list(_yield_name_dep_wvisited(rule_wd, visited))
[('A', 'B'), ('B', ('C', 'F')), ('C', ('D', 'E')), ('D', None), ('E', None), ('F', 'E')]

So as you can see now the node E is visited only once.

Community
  • 1
  • 1
oz123
  • 27,559
  • 27
  • 125
  • 187
  • Why not store the class itself in `depends` rather than the name of a class? – Eric Oct 14 '13 at 10:40
  • @Eric, first because I would not know how to do it. Second, classes here are just for the demo. The code works on some properties defined in configuration file, and I wanted a self-containing example. Can you show what you mean? – oz123 Oct 14 '13 at 10:43
  • `depends = (E, F)`, without quotes. Note you'll need to define the things being depended on earlier in the file than the things which depend on them, but that's probably a good idea for readability anyway. – Eric Oct 14 '13 at 10:59
  • I believe this question is more suited to [codereview.SE](http://codereview.stackexchange.com/?as=1) Anyway, I believe you could simplify the code if you *always* used tuples for `depends`. I mean, having `depends = None`, `depends = 'A'` and `depends = ('A', 'B')` you have to treat explicitly each of these cases. You could simply use `depends = ()`, `depends = ('A',)` and `depends = ('A', 'B')` and have more uniform code. An other thing: `instance = i(); if instance.depends:` Since `depends` is a *class* attribute you don't have to instantiate the class. Simply do `if i.depends:`. – Bakuriu Oct 14 '13 at 11:34
  • If you always used tuples you could remove completely the `if not dep ...`, since the `for` would never be executed without dependencies. Right know your code works only if dependencies have single letter names, in other cases you'd have problems with `depends = 'MultiLetterName'` vs `depends = ('A', 'B')` and you'd have to check whether `dep` is a string. – Bakuriu Oct 14 '13 at 11:37
  • @Bakuriu, thanks for the tipp with instance.depends. Second, my code works in the real project also with dependecies decelared on strings, not single letteres. You are right, in the real code I am really only using tupples. – oz123 Oct 14 '13 at 12:14

2 Answers2

2

Using the feedback from Gareth and other kind users of Stackoverflow, here is what I came up with. It is clearer, and also more general:

def _dfs(start_nodes, rules, visited):
    """
    Depth First Search
    start_nodes - Dictionary of Rule with dependencies (as Tuples):    

        start_nodes = {'A': ('B','C')}

    rules - Dictionary of Rules with dependencies (as Tuples):
    e.g.
    rules = {'A':('B','C'), 'B':('D','E'), 'C':('E','F'), 
             'D':(), 'E':(), 'F':()}
    The above rules describe the following DAG:

                    A
                   / \
                  B   C
                 / \ / \
                D   E   F
    usage:
    >>> rules = {'A':('B','C'), 'B':('D','E'), 'C':('E','F'), 
                 'D':(), 'E':(), 'F':()}
    >>> visited = []
    >>> list(_dfs({'A': ('B','C')}, rules, visited))
    [('A', ('B', 'C')), ('B', ('D', 'E')), ('D', ()), ('E', ()), 
    ('C', ('E', 'F')), ('F', ())]
    """

    for rule, dep in start_nodes.items():
        if rule not in visited:
            yield rule, dep
            visited.append(rule)
            for ii in dep:
                new_dep={ ii : rules[ii]}
                for dep in _dfs(new_dep, rules, visited):
                    if dep not in visited:
                        yield dep
oz123
  • 27,559
  • 27
  • 125
  • 187
1

Here is another way to do do a breadth first search without duplicating the visited nodes.

import pylab
import networkx as nx

G = nx.DiGraph()
G.add_nodes_from([x for x in 'ABCDEF'])
G.nodes()

returns ['A', 'C', 'B', 'E', 'D', 'F']

G.add_edge('A','B')
G.add_edge('A','C')
G.add_edge('B','D')
G.add_edge('B','E')
G.add_edge('C','E')
G.add_edge('C','F')

and here is how you can traverse the tree without duplicating nodes.

nx.traversal.dfs_successors(G)

returns {'A': ['C', 'B'], 'B': ['D'], 'C': ['E', 'F']} and you can draw the graph.

nx.draw(G,node_size=1000)
Back2Basics
  • 7,406
  • 2
  • 32
  • 45
  • actually, that was my prefered solution. My Colleague was really against adding 'another dependency' to out project. I think it is always better to avoid re-inventing the wheel. Although, when you re-invent the wheel you learn alot. – oz123 Oct 27 '13 at 18:20
  • 1
    You can spend your time learning a lot about what others have done or continuing with your research which may help everyone learn something new. There are a lot of algorithms out there. Be careful with your time. I wish you well. – Back2Basics Oct 28 '13 at 04:48