2

This is actually question about the semantics of set comprehensions, but I first need to explain the context. I'm trying to create a new set of tuples where the paired value in the touple is unique regardless of the order of the values in the pair. Simplifying my actual program, what I have is something like {(1, 2), (2, 1), (3, 4)} and I'd like to get {(1, 2), (3, 4)}

I tried doing something like this:

oldSet = {(1, 2), (2, 1), (3, 4)}

newSet = set()
newSet = {(val1, val2) for (val1, val2) in oldSet if not (val2, val1) in newSet}

However, newSet is {(1, 2), (2, 1), (3, 4)}, implying that something is wrong with my conditional expression. My understanding of comprehensions suggests that the above is syntactic sugar for something like this:

newSet =  set()
for (val1, val2) in oldSet:
  if not (val2, val1) in newSet:
    newSet.add((val1, val2))

This traditional looping structure works (newSet is {(1, 2), (3, 4)}). Is there something about comprehensions that causes the conditional to be evaluated before newSet has any members? I'm fairly new to Python, so I'm wondering if there's something subtle I'm missing.

Thanks!

Chris Brendel
  • 700
  • 3
  • 10
  • 1
    `in newSet` is checking *in the empty set `newSet = set()`*, not the "work-in-progress" set you're creating to replace it. The whole list comprehension is evaluated **before** the result is assigned to the name `newSet` to replace the empty set (which is why you get a `NameError` if you remove the preceding line). Your understanding of comprehensions is therefore not quite correct! – jonrsharpe Jun 19 '15 at 15:45
  • Thank you, @jonrsharpe! This makes sense now. – Chris Brendel Jun 19 '15 at 17:27

2 Answers2

5

You misunderstood; the set comprehension is a distinct expression, separate from the assignment. The expression produces a new set() object that then will be assigned to newSet, replacing the old set() object you had.

As such, as you iterate and build the set, the previous and separate set() object bound to newSet remains empty. In reality, the set comprehension does this:

newSet = set()
_result = set()
for (val1, val2) in oldSet:
    if not (val2, val1) in newSet:
        result.add((val1, val2))
newSet = _result

You could use side effects to alter a separate set as you iterate:

seen = set()
newSet = {(val1, val2) for (val1, val2) in oldSet
          if not ((val2, val1) in seen or seen.add((val1, val2))}

This uses seen to track what has already been processed and includes a tuple if both conditions are true:

  • the inverse has not already been seen before,
  • the seen.add() operation for the tuple returns a false value. Since seen.add() always returns None, that is always going to be the case.

Note that this now builds the same set twice, so you may as well do a regular loop and be done with it:

newSet = set()
for (val1, val2) in oldSet:
    if not (val2, val1) in newSet:
        newSet.add((val1, val2))

Since your tuples are consist of two values only, you may as well use sorting here; any pair of tuples (a, b), (b, a) has one unique sorting, after all:

newSet = {tuple(sorted(t)) for t in oldSet}
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
2

A working alternative is:

newSet = { tuple(sorted(t)) for t in oldSet }

Your solution is checking the presence of the tuple in the set that is being generated, but the name is not bound to the value yet. It will when the comprehension terminates.

Pynchia
  • 10,996
  • 5
  • 34
  • 43