5

I have just read "isinstance() considered harmful", and it seems reasonable. In short, it argues for avoiding the use of this function.

Well, just now I happen to be writing a program which takes inputs structured as a tree, and needs the tree's structure information. Without time to implement a GUI, I impose on the user to write it to a configuration file (I know this is a bad interface, but the schedule is really tight). My users are very technical, but don't necessarily know python. I chose that the file will contain lists of lists (of lists of lists etc) representing the input trees, with the final elements being the leaf nodes of the trees. I think this is much better than imposing the synthax of dictionaries on users.

I plan to parse the lists recursively as the following (ommiting the use of the tree's structure, let's simplify and say treatLeafNode() must be called on each leaf node):

def parseTree(input):
    if isinstance (input, list):
        for item in input:
            parseTree(item)
    else:
        treatLeafNode(item)

In light of the article, I'm wondering if there is a simple way to recurse over that without using isinstance()...

Does anyone know one?

Emilio M Bumachar
  • 2,532
  • 3
  • 26
  • 30

5 Answers5

10

Your situation is one of those where I would use isinstance. Your data structure is well-constrained, and you need to distinguish between a list and not a list. Use isinstance to ask if it is a list. You don't say, but I imagine strings might be among the leaves of your tree, and they are iterable as lists are, so it's fiddly to distinguish between them in duck-typing ways.

Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
5

You could use

def parseTree(input):
    try:
        for item in input:
            parseTree(item)
    except TypeError:
        treatLeafNode(item)

Note that this will also iterate over strings though.

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • 1
    @Sven: Maybe Andrey meant that `for item in input` will also succeed if `input` is a string, making this code fail for a "tree" like `["abc", ["rty", "xyz"]]` – Eli Bendersky May 04 '11 at 11:53
  • @Eli: Didn't I state that in my post? Maybe not as clear as you just did! :) – Sven Marnach May 04 '11 at 12:01
  • 1
    @Sven: sorry, noticed it after commenting. I do agree with Ned though, that the proximity between strings and lists is a problem here, which makes `isinstance` better. Another alternative that could work is checking `hasattr(item, '__iter__')`, since strings don't have that – Eli Bendersky May 04 '11 at 12:16
  • @Eli: I don't really like `hasattr(item, '__iter__')` because it's a bit too cryptic (and requires that everyone reading the code knows those sublteties). I'd go for the more explicit `isinstance(item, collections.Iterable) and not isinstance(item, basestring)` If that's the semantics I was looking for. – Sven Marnach May 04 '11 at 12:35
  • @Sven: but that's back to `isinstance`, so it's much easier on the eyes to just call the original `isinstance(list)` than what you propose :-) I agree that `hasattr` is not the most obvious workaround, but so isn't the `try ... except` - it takes some Python expertise to guess that it attempts to catch an exception from the `in` operator - note that it will also catch unrelated `TypeError` exceptions thrown by `treatLeafNode` down the tree – Eli Bendersky May 04 '11 at 12:46
  • @Eli: Avoiding `isinstance()` is all about increasing the applicability of your code to different types. Using `isinstance(item, list)` won't even allow trees of tuples instead of lists, let alone any user-defined types. – Sven Marnach May 04 '11 at 12:53
2

What might work better is encapsulating your tree structure with a Node object which can hold a value and a list of children:

class Node(object):
    def __init__(self, children=[], value=None):
        self.children = children
        self.value = value
    def isLeaf(self):
        return len(self.children) == 0

Now a node is explicitly either a leaf with a value or an element with children (technically, non-leaf nodes can also have values in this example, but your application code can choose to enforce that non-leaf nodes never have values). parseTree can be written as:

def parseTree(node):
    if node.isLeaf():
        treatLeafNode(node)
    else:
        for child in node.children:
            parseTree(child)

Note that this is a depth-first search on the tree.

There may be nicer ways to wrap this up so that parseTree is a method of Node, but this should give you an idea. Of course you still have the problem that you're asking the user to write Python code which is lists of lists as the input, and to parse that into the above tree structure you'd need to use isinstance. Perhaps yaml would be a better choice of description language, as the users cannot then inject arbitrary Python code into your input?

Luke Marsden
  • 131
  • 3
0

How about using yaml? You wont have to do the validation and the parsing logic yourself too.

The Tree could look like

- [[aMAN],[sTACK, OVER],[FLOW]]
- [[aMAN1],[sTACK1, OVER1],[FLOW1]]
- [[aMAN2],[sTACK2, OVER2],[FLOW2]]

Code to parse it

import yaml                    
f= open('test.yml')
test = yaml.load(f.read())
print test

Output:

[[['aMAN'], ['sTACK', 'OVER'], ['FLOW']], [['aMAN1'], ['sTACK1', 'OVER1'], ['FLOW1']], [['aMAN2'], ['sTACK2', 'OVER2'], ['FLOW2']]]
anijhaw
  • 8,954
  • 7
  • 35
  • 36
0

Is there a reason that you've chosen a list of lists as your preferred tree structure? I can think of many better ways to write one in a config file. Suppose you are trying to encode:

a
|-- b
|   |-- c
|   |-- d
|   |   |-- e
|   |   `-- f
|   `-- g
|       `-- h
|-- i
`-- j
    `-- k

How about

a: b, i, j
b: c, d, g
d: e, f
g: h
j: k

You can parse that into a dictionary pretty easily, and join it up into a tree. In fact, I think ConfigParser will already do it for you.

Or else, how about:

a
----b
--------c
--------d
------------e
------------f
--------g
------------h
----i
----j
--------k
Katriel
  • 120,462
  • 19
  • 136
  • 170