14

Python's print statement normally seems to print the repr() of its input. Tuples don't appear to be an exception:

>>> print (1, 2, 3)
(1, 2, 3)
>>> print repr((1, 2, 3))
(1, 2, 3)

But then I stumbled across some strange behavior while messing around with CPython's internals. In short: if you trick Python 2 into creating a self-referencing tuple, printing it directly behaves completely differently from printing its repr() / str() / unicode() representations.

>>> print outer   # refer to the link above
((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((
((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((
... many lines later ...
((((((((((Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError: stack overflow
>>> print repr(outer)
((...),)
>>> print str(outer)
((...),)
>>> print unicode(outer)
((...),)

So what exactly is print doing? In an attempt to answer this question myself, I referred to the language reference:

6.6. The print statement

print evaluates each expression in turn and writes the resulting object to standard output (see below). If an object is not a string, it is first converted to a string using the rules for string conversions.

And the rules for string conversions are:

5.2.9. String conversions

A string conversion is an expression list enclosed in reverse (a.k.a. backward) quotes:

string_conversion ::=  "`" expression_list "`"

But enclosing outer in back quotes has the same result as calling repr() and friends. No dice. So what the heck is print actually doing behind the scenes?

(Interestingly, the behavior is "fixed" in Python 3: printing a self-referencing tuple gives the ellipsis-truncated form.)

Community
  • 1
  • 1
ashastral
  • 2,818
  • 1
  • 21
  • 32
  • I got `struct.error: 'I' format requires 0 <= number <= 4294967295` when I tried your code. – thefourtheye Dec 21 '13 at 06:37
  • You're probably running a 64-bit build of Python. Replacing both instances of 'I' with 'Q' should, in theory, fix that. – ashastral Dec 21 '13 at 06:40
  • Now it fails at `c_outer[inner_index:inner_index+4] = struct.pack('Q', id(outer))` with `ValueError: Can only assign sequence of same size` – thefourtheye Dec 21 '13 at 06:41
  • Try replacing `+4` with `+8`. **EDIT:** I've updated the linked Gist and it should now work on both 32-bit and 64-bit platforms. – ashastral Dec 21 '13 at 06:44
  • +1 it failed with segmentation fault, after printing a huge chunk of `(`. – thefourtheye Dec 21 '13 at 06:55
  • This is most likely a bug. Why don't you report this? – thefourtheye Dec 21 '13 at 07:00
  • I doubt it's significant enough to warrant a fix, given that the only way to reproduce it is by messing around with pointers. I'm more interested in why there are apparently two behaviors for stringifying tuples, one of which breaks under a very obscure corner case. – ashastral Dec 21 '13 at 07:04
  • It's already been reported and rejected as `not to be fixed` in Python 2. `http://bugs.python.org/issue1069092` – Ned Deily Dec 21 '13 at 07:22
  • That bug report deals with many tuples that are deeply nested inside one another, not a single tuple nested inside itself. The stack overflow in my example is really just a side effect of the actual bug. – ashastral Dec 21 '13 at 07:33
  • 2
    Sorry, I should have looked closer. A stack track shows that the stack overflow results from a recursive call loop between `internal_print` (around `object.c:315` and `tupleprint` (around `tupleobject.c:253`). The problem of recursive container reprs was fixed in Python 3.2: http://bugs.python.org/issue9840 – Ned Deily Dec 21 '13 at 08:11

1 Answers1

7

You can find out what is actually happening by disassembling python bytecode.

>>> from dis import dis
>>> dis(compile('print outer', '<string>', 'exec'))
  1           0 LOAD_NAME                0 (outer)
              3 PRINT_ITEM          
              4 PRINT_NEWLINE       
              5 LOAD_CONST               0 (None)
              8 RETURN_VALUE

And reading the source for the underlying opcodes.

PRINT_ITEM eventually reaches this block of code:

else if (Py_TYPE(op)->tp_print == NULL) {
    PyObject *s;
    if (flags & Py_PRINT_RAW)
        s = PyObject_Str(op);
    else
        s = PyObject_Repr(op);
    ...
}
else
    ret = (*Py_TYPE(op)->tp_print)(op, fp, flags);

This means that __str__ or __repr__ will be called only if object's type does not have a tp_print function. And tupleobject has one.

If you want to understand the internals of CPython the best way is to read the source code. I recommend a series of tutorials on python internals, it explains everything you must know to fully understand the output of python dis function.

Blin
  • 698
  • 9
  • 18