Background:
I was working on a project were I needed to write some rules for text processing. After working on this project for a couple of days and implementing some rules, I realized I needed to determine the order of the rules. No problems, we have topological sorting to help. But then I realized that I can't expect the graph to be always full. So I came up with this idea, that given a single rule with a set of dependencies (or a single dependency) I need to check the dependencies of the dependencies. Sounds familiar? Yes. This subject is very similar to Depth-first-searching of a graph.
I am not a mathematician, nor did I study C.S. Hence, Graph Theory is a new field for me. Nevertheless, I implemented something (see below) which works (inefficiently, I suspect).
The code:
This is my search and yield algorithm. If you run it on the examples below, you will see it visits some nodes more then once. Hence, the speculated inefficiency.
A word about the input. The rules I wrote are basically python classes, which have a class property depends
. I was criticized for not using inspect.getmro
- But this would complicate thing terribly because the class would need to inherit from each other (See example here)
def _yield_name_dep(rules_deps):
global recursion_counter
recursion_counter = recursion_counter +1
# yield all rules by their named and dependencies
for rule, dep in rules_deps.items():
if not dep:
yield rule, dep
continue
else:
yield rule, dep
for ii in dep:
i = getattr(rules, ii)
instance = i()
if instance.depends:
new_dep={str(instance): instance.depends}
for dep in _yield_name_dep(new_dep):
yield dep
else:
yield str(instance), instance.depends
OK, now that you stared in the code, here is some input you can test:
demo_class_content ="""
class A(object):
depends = ('B')
def __str__(self):
return self.__class__.__name__
class B(object):
depends = ('C','F')
def __str__(self):
return self.__class__.__name__
class C(object):
depends = ('D', 'E')
def __str__(self):
return self.__class__.__name__
class D(object):
depends = None
def __str__(self):
return self.__class__.__name__
class F(object):
depends = ('E')
def __str__(self):
return self.__class__.__name__
class E(object):
depends = None
def __str__(self):
return self.__class__.__name__
"""
with open('demo_classes.py', 'w') as clsdemo:
clsdemo.write(demo_class_content)
import demo_classes as rules
rule_start={'A': ('B')}
def _yield_name_dep(rules_deps):
# yield all rules by their named and dependencies
for rule, dep in rules_deps.items():
if not dep:
yield rule, dep
continue
else:
yield rule, dep
for ii in dep:
i = getattr(rules, ii)
instance = i()
if instance.depends:
new_dep={str(instance): instance.depends}
for dep in _yield_name_dep(new_dep):
yield dep
else:
yield str(instance), instance.depends
if __name__ == '__main__':
# this is yielding nodes visited multiple times,
# list(_yield_name_dep(rule_start))
# hence, my work around was to use set() ...
rule_dependencies = list(set(_yield_name_dep(rule_start)))
print rule_dependencies
The questions:
- I tried classifying my work, and I think what I did is similar to DFS. Can you really classify it like this?
- How can I improve this function to skip visited nodes, and still use generators ?
update:
Just to save you the trouble running the code, the output of the above function is:
>>> print list(_yield_name_dep(rule_wd))
[('A', 'B'), ('B', ('C', 'F')), ('C', ('D', 'E')), ('D', None), ('E', None), ('F', 'E'), ('E', None)]
>>> print list(set(_yield_name_dep(rule_wd)))
[('B', ('C', 'F')), ('E', None), ('D', None), ('F', 'E'), ('C', ('D', 'E')), ('A', 'B')]
In the mean while I came up with a better solution, the question above still remain. So feel free to criticize my solution:
visited = []
def _yield_name_dep_wvisited(rules_deps, visited):
# yield all rules by their name and dependencies
for rule, dep in rules_deps.items():
if not dep and rule not in visited:
yield rule, dep
visited.append(rule)
continue
elif rule not in visited:
yield rule, dep
visited.append(rule)
for ii in dep:
i = getattr(grules, ii)
instance = i()
if instance.depends:
new_dep={str(instance): instance.depends}
for dep in _yield_name_dep_wvisited(new_dep, visited):
if dep not in visited:
yield dep
elif str(instance) not in visited:
visited.append(str(instance))
yield str(instance), instance.depends
The output of the above is:
>>>list(_yield_name_dep_wvisited(rule_wd, visited))
[('A', 'B'), ('B', ('C', 'F')), ('C', ('D', 'E')), ('D', None), ('E', None), ('F', 'E')]
So as you can see now the node E is visited only once.