0

I've encountering a weird behavior while working with lists in Python. I've implemented a method that returns a list of lists of Integers; in particular, those are cycles within a graph each including three nodes:

simple_cycles = compute_cycles(graph)

That returns me something like this:

[[40000,20000,30000],[700,500,600],[600,500,700],..]

Now, I need to (1) order each list of the list, and after that, I need to (2) remove duplicates from the entire list, and (3) I need to sort that entire list, again. The desired result then might look as follows:

[[500,600,700],[20000,30000,40000]]

Task (1) is achieved by sorting the internal lists prior to returning them via compute_cycles. Tasks (2) and (3) are obtained by executing the following line:

cycles = dict((x[0], x) for x in simple_cycles).values()

This works for the first graph processed. Each following graph fails, because the ordering within the internal lists is sometimes wrong. I tried the last source code line twice, and the second result was other than expected. For example, I got as x in the second run:

[29837921, 27629939, 27646591]

instead of

[27629939, 27646591, 29837921]

This result in choosing 29837921 as the key in the dictionary instead of 27629939. Thus, the initial ordering with sorted(x) seems already to be false. But why?

I tried to reproduce that behavior outside of my program, but I can't. In my application, I am parsing an XML document like this:

detector = MyParser()
handler = MyHandler()
handler.subscribe(detector.update)
detector.parse(filename, handler)

..

def parse(self, infile, handler):
  parser = etree.XMLParser(target=handler)
  etree.parse(infile, parser)

When executing, for example,

detector = MyParser()
handler = MyHandler()
handler.subscribe(detector.update)
detector.parse(filename, handler)
detector.parse(filename, handler)

then the ordering of the second run is unexpected.

I know, my source code example is not good to reproduce it by yourself, but maybe I am missing some elemental Python stuff while working with lists.

Update

Here is the creation of the lists:

from networkx import dfs_successors

def compute_cycles(graph):
  cycles = []
  for node in graph.nodes():
    a = graph.successors(node);
    for a_node in a:
      b = graph.successors(a_node)
      for next_node in b:
        c = graph.successors(next_node);
        if len(c) > 1:
          if c[0] == node:
            cycles.append(sorted([node, a_node, next_node]))
          elif c[1] == node:
            cycles.append(sorted([node, a_node, next_node]))
        else:
          if c == node:
            cycles.append(sorted([node, a_node, next_node]))
        #fi
      #rof
    #rof
  #rof
  return cycles

Update

If made a big mistake: I've overwritten the __repr__ function of my Node object used within the graph, so that it returns an integer. Maybe, the sorting fails because I am dealing with real objects instead of integers. I changed my call to the sort function this way:

cycles.append(sorted([node, a_node, next_node], key=lambda revision: revision.rev.revid))

I'll have to see if that makes a difference. The node class is defined as follows:

class Node(object):
  def __init__(self, revision, revision_hash):
    self.rev = revision
    self.revhash = revision_hash

  def __repr__(self):
    return repr((self.rev.revid))
labrassbandito
  • 535
  • 12
  • 25

3 Answers3

3

I don't understand why you're using dict.

print sorted(set(tuple(sorted(x)) for x in L))
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • 1
    It seems like he's using only the first element of each triplet as the key, so `dict` would be needed. – interjay Mar 24 '11 at 10:26
  • @interjay He's using the first element as key as a means to sort, and this doesn't work since dictionaries do not guarantee ordering. – juanchopanza Mar 24 '11 at 10:40
1

Dictionaries do not necessarily keep the order. They are allowed to change it. Put this in the interpreter: {'a': 1, 'b': 2, 'c': 3}. I got {'a': 1, 'c': 3, 'b': 2}.

Tyler Crompton
  • 12,284
  • 14
  • 65
  • 94
  • I've forgot to mention that I am not interested in the dictionary, only in the values. Thus, I am executing cycles.sort() in the end. – labrassbandito Mar 24 '11 at 10:45
0

My problem is finally solved. Because I put objects in lists instead of simple Integers, I had to use the sort method as follows:

sorted([node, a_node, next_node], key=lambda revision: revision.rev.revid))

Here, I am accessing the member variable containing the Integer, which was already returned by __str__. However, the implicit conversion while sorting wasn't stable.

labrassbandito
  • 535
  • 12
  • 25