18

Yesterday I came across this odd unpacking difference between Python 2 and Python 3, and did not seem to find any explanation after a quick Google search.

Python 2.7.8

a = 257
b = 257
a is b # False

a, b = 257, 257
a is b # False

Python 3.4.2

a = 257
b = 257
a is b # False

a, b = 257, 257
a is b # True

I know it probably does not affect the correctness of a program, but it does bug me a little. Could anyone give some insights about this difference in unpacking?

Josh Kelley
  • 56,064
  • 19
  • 146
  • 246
Tiensbakung
  • 191
  • 5
  • 4
    it really doesn't matter, you shouldn't rely on any interning behaviour... use comparison. – Karoly Horvath Dec 12 '14 at 12:43
  • 4
    Python interns numbers in the same expression, I'll find the respective code (had it already for an earlier question :)) – filmor Dec 12 '14 at 12:44
  • This might have to do with your implementation, and how it caches small integers. Have you tried it with a smaller number than 257? Was the choice of 257 random? – Nasser Al-Shawwa Dec 12 '14 at 13:03
  • 4
    This is an *internal* implementation detail. Do not rely on it. – ch3ka Dec 12 '14 at 13:04
  • @Nasser I guess 257 was explicitly chosen to not trigger the small integer optimisation. – filmor Dec 12 '14 at 13:05
  • @Nasser filmor is right, 257 is not random, if you try any number smaller than 257, you will find it always return True. – Tiensbakung Dec 12 '14 at 13:17
  • @filmor, I think I understand what you mean, can you please explain a little bit about the rationale behind this change of the actual implementation from Python2 to Python3? Remember to post as an answer so I can accept it :) – Tiensbakung Dec 12 '14 at 13:18
  • it's just an optimization. implementation detail. – Karoly Horvath Dec 12 '14 at 13:23
  • 3
    This phenomenon doesn't really have anything to do with unpacking - it's all about interning of small integers. – PM 2Ring Dec 12 '14 at 13:31
  • 2
    This has nothing to do with integers. Python 3.x will only bother producing one object for any duplicated (immutable) literals in an expression. Try using floats or strings instead. Some literals are further interned so that the same object is used across multiple expressions. Floats are never interned and only small strings are interned. – Dunes Dec 12 '14 at 15:02

2 Answers2

24

This behaviour is at least in part to do with how the interpreter does constant folding and how the REPL executes code.

First, remember that CPython first compiles code (to AST and then bytecode). It then evaluates the bytecode. During compilation, the script looks for objects that are immutable and caches them. It also deduplicates them. So if it sees

a = 257
b = 257

it will store a and b against the same object:

import dis

def f():
    a = 257
    b = 257

dis.dis(f)
#>>>   4           0 LOAD_CONST               1 (257)
#>>>               3 STORE_FAST               0 (a)
#>>>
#>>>   5           6 LOAD_CONST               1 (257)
#>>>               9 STORE_FAST               1 (b)
#>>>              12 LOAD_CONST               0 (None)
#>>>              15 RETURN_VALUE

Note the LOAD_CONST 1. The 1 is the index into co_consts:

f.__code__.co_consts
#>>> (None, 257)

So these both load the same 257. Why doesn't this occur with:

$ python2
Python 2.7.8 (default, Sep 24 2014, 18:26:21) 
>>> a = 257
>>> b = 257
>>> a is b
False

$ python3
Python 3.4.2 (default, Oct  8 2014, 13:44:52) 
>>> a = 257
>>> b = 257
>>> a is b
False

?

Each line in this case is a separate compilation unit and the deduplication cannot happen across them. It works similarly to

compile a = 257
run     a = 257
compile b = 257
run     b = 257
compile a is b
run     a is b

As such, these code objects will both have unique constant caches. This implies that if we remove the line break, the is will return True:

>>> a = 257; b = 257
>>> a is b
True

Indeed this is the case for both Python versions. In fact, this is exactly why

>>> a, b = 257, 257
>>> a is b
True

returns True as well; it's not because of any attribute of unpacking; they just get placed in the same compilation unit.

This returns False for versions which don't fold properly; filmor links to Ideone which shows this failing on 2.7.3 and 3.2.3. On these versions, the tuples created do not share their items with the other constants:

import dis

def f():
    a, b = 257, 257
    print(a is b)

print(f.__code__.co_consts)
#>>> (None, 257, (257, 257))

n = f.__code__.co_consts[1]
n1 = f.__code__.co_consts[2][0]
n2 = f.__code__.co_consts[2][1]

print(id(n), id(n1), id(n2))
#>>> (148384292, 148384304, 148384496)

Again, though, this is not about a change in how the objects are unpacked; it is only a change in how the objects are stored in co_consts.

Community
  • 1
  • 1
Veedrac
  • 58,273
  • 15
  • 112
  • 169
  • Wow, thanks a lot, Veedrac! I really did not expect a little weirdness that bugs me to expand to such a detailed insight into how the Python interpreter works. – Tiensbakung Dec 12 '14 at 16:21
8

I think this is actually by accident, as I can't reproduce the behaviour with Python 3.2.

There is this issue http://bugs.python.org/issue11244 that introduces a CONST_STACK to fix problems with constant tuples with negative numbers not being optimised (look at the patches against peephole.c, which contains Python's optimiser runs).

This seems to also have led to the given behaviour. Still looking into this :)

filmor
  • 30,840
  • 6
  • 50
  • 48
  • Hmm, interesting! I just tested the code in a script, both Python 2 and Python 3 returned True for all cases. Guess only the interactive interpreter is a bit lazy. Anyway, I will accept it as a small optimization of Python 3 over Python 2 before I get too paranoid :) – Tiensbakung Dec 12 '14 at 13:53
  • 2
    I just tried `a,b = 257,257` in Pythons 2.5, 2.6, 2.7, 3.3, 3.4. Every one of them reported `a is b` as True in the interactive interpreter. Are you sure about your original experiments? – Ned Batchelder Dec 12 '14 at 14:17
  • Hmm, maybe IPython does something fancy here? A difference between the interactive and the script one is, that the former does `LOAD_CONST 257; LOAD_CONST 257` while in the latter it's folded to `LOAD_CONST (257, 257)`. – filmor Dec 12 '14 at 14:48
  • 1
    Also, I took these two as hints: http://ideone.com/FZrgt4 (3.2), http://ideone.com/jV64I6 (2.7) – filmor Dec 12 '14 at 14:51