1

Here's the test code:

units = [1, 2]
tens = [10, 20]
nums = (a + b for a in units for b in tens)
units = [3, 4]
tens = [30, 40]
[x for x in nums]

Under the assumption that the generator expression on line 3 (nums = ...) forms an iterator I would expect the final result to reflect the final assigned values for units and tens. OTOH, if that generator expression were to be evaluated at line 3, producing the result tuple, then I'd expect the first definitions of units and tens to be used.

What I see is a MIX; i.e., the result is [31, 41, 32, 42]!?

Can anyone explain this behavior?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Bill Cohagan
  • 379
  • 3
  • 9
  • 1
    The answer is the same; `units` is an argument to the generator expression 'function', while `tens` is looked up as a global. So `units` is bound at line 3, `tens` is not. – Martijn Pieters Mar 27 '14 at 17:04
  • Note that this is not Python 3 specific. – Steven Rumbalski Mar 27 '14 at 17:15
  • @StevenRumbalski: nope, it applies to all Python versions from 2.4 onwards, where generator expressions were introduced. – Martijn Pieters Mar 27 '14 at 17:18
  • I've just discovered (from the "friend" who sent me this puzzle) that it came from http://web.archive.org/web/20111003161227/http://web.mit.edu/rwbarton/www/python.html (and referenced in http://ballingt.com/2014/03/23/surprising-python.html?utm_source=Python+Weekly+Newsletter&utm_campaign=8d370a904c-Python_Weekly_Issue_132_March_27_2014&utm_medium=email&utm_term=0_9e26887fc5-8d370a904c-312683793). I'm not yet clear on the applicable scoping rules, but will continue to pound my head against the explanations provided here until I figure it out. (I think I prefer the scoping rules in Scheme!) – Bill Cohagan Mar 27 '14 at 23:06

1 Answers1

3

A generator expression creates a function of sorts; one with just one argument, the outermost iterable.

Here that's units, and that is bound as an argument to the generator expression when the generator expression is created.

All other names are either locals (such as a and b), globals, or closures. tens is looked up as a global, so it is looked up each time you advance the generator.

As a result, units is bound to the generator on line 3, tens is looked up when you iterated over the generator expression on the last line.

You can see this when compiling the generator to bytecode and inspecting that bytecode:

>>> import dis
>>> genexp_bytecode = compile('(a + b for a in units for b in tens)', '<file>', 'single')
>>> dis.dis(genexp_bytecode)
  1           0 LOAD_CONST               0 (<code object <genexpr> at 0x10f013ae0, file "<file>", line 1>)
              3 LOAD_CONST               1 ('<genexpr>')
              6 MAKE_FUNCTION            0
              9 LOAD_NAME                0 (units)
             12 GET_ITER
             13 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             16 PRINT_EXPR
             17 LOAD_CONST               2 (None)
             20 RETURN_VALUE

The MAKE_FUNCTION bytecode turned the generator expression code object into a function, and it is called immediately, passing in iter(units) as the argument. The tens name is not referenced at all here.

This is documented in the original generators PEP:

Only the outermost for-expression is evaluated immediately, the other expressions are deferred until the generator is run:

g = (tgtexp  for var1 in exp1 if exp2 for var2 in exp3 if exp4)

is equivalent to:

def __gen(bound_exp):
    for var1 in bound_exp:
        if exp2:
            for var2 in exp3:
                if exp4:
                    yield tgtexp
g = __gen(iter(exp1))
del __gen

and in the generator expressions reference:

Variables used in the generator expression are evaluated lazily when the __next__() method is called for generator object (in the same fashion as normal generators). However, the leftmost for clause is immediately evaluated, so that an error produced by it can be seen before any other possible error in the code that handles the generator expression. Subsequent for clauses cannot be evaluated immediately since they may depend on the previous for loop. For example: (x*y for x in range(10) for y in bar(x)).

The PEP has an excellent section motivating why names (other than the outermost iterable) are bound late, see Early Binding vs. Late Binding.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Can you please point where is this documented? Is kind of unexpected behavior. It is clear from the disassembled code but I'll like to read what's the logic behind it. – Paulo Bu Mar 27 '14 at 17:10
  • @PauloBu: The [execution model](http://docs.python.org/3/reference/executionmodel.html) and [expressions documentation](http://docs.python.org/3/reference/expressions.html#displays-for-lists-sets-and-dictionaries) *hint* at it; certainly that generator expressions, and list, set and dict comprehensions use a separate scope. – Martijn Pieters Mar 27 '14 at 17:12
  • It isn't unexpected behaviour. The generator references the [10,20] list because it's bound to the name you use in the generator expression. Then you bind another list [30,40] to a name that has nothing to do with the generator. – Lorenzo Gatti Mar 27 '14 at 17:13
  • Interesting. I was surprised by this. Do you know the reason as to why `tens` is not also closed over? It has a feeling of arbitraryness that I'm not used to when I use Python. – Steven Rumbalski Mar 27 '14 at 17:14
  • 1
    @PauloBu: The [generator expression PEP](http://www.python.org/dev/peps/pep-0289/) does spell it out. – Martijn Pieters Mar 27 '14 at 17:14
  • @StevenRumbalski: Even as a closure it would not be referenced until the loop was iterated. – Martijn Pieters Mar 27 '14 at 17:15
  • @LorenzoGatti I can understand that, but I'll rather treat both the same. That's why I asked for the docs to see what's the reasoning behind this behavior. – Paulo Bu Mar 27 '14 at 17:15
  • Hmm... I was playing with this and thought perhaps I could put `tens` into another generator expression and thereby capture it: `nums = (a + b for a in units for b in (c for c in tens))`. However this did not alter the output of the OP's exercise. Looks like I'm going to have to actually think about this rather than intuit. – Steven Rumbalski Mar 27 '14 at 17:25
  • @StevenRumbalski: Because the nested generator is created for each loop over `units`. The outer loop iterable is the only object that is bound. – Martijn Pieters Mar 27 '14 at 17:26
  • @MartijnPieters: Yep, I see it now. Otherwise it would be problematic as I could only iterate over the second generator once. – Steven Rumbalski Mar 27 '14 at 17:28
  • I found reading the PEP helpful, especially the section [Early Binding versus Late Binding](http://legacy.python.org/dev/peps/pep-0289/#early-binding-versus-late-binding). I now see this design decision through a "practicality beats purity" lens. – Steven Rumbalski Mar 27 '14 at 17:34