3

I'm seeking advice on elegant designs for representing a file directory without symlinks in Python where I can query the relationships in terms of "belongs to" (e.g. G is a subdirectory of /A/B/C). My current thinking goes in this direction:

Given a root path I os.path.walk() it top down. Two classes represent the node types I'm interested in and I keep track of parent child relationships.

class ADir(object):
    def __init_(self, name, parent=None):
        self.name = name
        self.parent = parent
        self.children = []
    def add_child(self, id):
        self.children.append(id)

class AFile(object):
    def __init_(self, name, parent=None):
        self.name = name
        self.parent = parent

I would have to re-implement checks for existing directories, functions which give me the position of a directory/file etc. It all starts to feel very much like a reimplementation of existing, general Tree algorithms.

Trawling trough StackExchange, Google, etc. yields a host of various approaches. None which I found seemed to make use of the natural boundaries given a directory structure.

Any thoughts and pointers to discussions, blog entries and code are appreciated.

Axial
  • 125
  • 7
  • What you have seems reasonable. I'm not sure what the question is. Maybe if you explain what you're trying to accomplish... – Jim Garrison Mar 07 '12 at 08:12
  • The gist of my question is if I'm trying to reinvent the wheel when mapping a file directory structure to a data model in Python. The original specification suggested to translate the directory structure to XML and handle queries with something like _lxml.etree_. Seems overkill and clunky to me, especially since I can't attach the 'payload' (operations on a set of files) elegantly. Maybe this explains a bit better what the background to my rather broad question is. – Axial Mar 08 '12 at 03:51

1 Answers1

2

The problem with tree structures in today's languages is that it's hard to create one structure to fit them all. There are many ways to build threes (with or without parent pointer, children can be pairs (binary or red-black trees) or lists (with and without indexing of lookup keys).

While it is possible to define the traversal algorithms for all of them but each algorithm needs a distinct implementation.

Then we have the problem to look up elements in the tree. Do we work by index (pretty useless in binary trees)? Some identifier? What type should the identifier have? How do we build paths from those identifiers? How do we represent relative paths?

Which is why we have maps and lists built into many modern languages but no trees. To my knowledge, Scala is one of the few OO languages which support the concept of a generic tree type but only binary trees and even those are somewhat odd.

On top of that, most OO languages don't support enough ways to build classes from fragments of existing classes. You can inherit (but then you get everything), multiple inherit (even more problems), mix in (some features of multiple inheritance without some of the drawbacks). But I'm really missing a feature which says: take method x() from type Foo and method y() from Bar to build Baz.

Without that, an OO based tree base class would need a lot of tweaking for your specific use case while directly implementing the same function would need the same amount (or even less) of code lines.

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
  • Thanks Aaron, this is very helpful and verifies my suspicion that a customized implementation is necessary. The pointer to Scala and its generic tree type is interesting as well. – Axial Mar 08 '12 at 03:39