5

Simple use of Python's str.format() method:

>>> '{0}'.format('zero')
'zero'

Hex, octal, and binary literals do not work:

>>> '{0x0}'.format('zero')
KeyError: '0x0'
>>> '{0o0}'.format('zero')
KeyError: '0o0'
>>> '{0b0}'.format('zero')
KeyError: '0b0'

According to the replacement field grammar, though, they should:

replacement_field ::=  "{" [field_name] ["!" conversion] [":" format_spec] "}"
field_name        ::=  arg_name ("." attribute_name | "[" element_index "]")*
arg_name          ::=  [identifier | integer]
attribute_name    ::=  identifier
element_index     ::=  integer | index_string
index_string      ::=  <any source character except "]"> +
conversion        ::=  "r" | "s"
format_spec       ::=  <described in the next section>

The integer grammar is as follows:

longinteger    ::=  integer ("l" | "L")
integer        ::=  decimalinteger | octinteger | hexinteger | bininteger
decimalinteger ::=  nonzerodigit digit* | "0"
octinteger     ::=  "0" ("o" | "O") octdigit+ | "0" octdigit+
hexinteger     ::=  "0" ("x" | "X") hexdigit+
bininteger     ::=  "0" ("b" | "B") bindigit+
nonzerodigit   ::=  "1"..."9"
octdigit       ::=  "0"..."7"
bindigit       ::=  "0" | "1"
hexdigit       ::=  digit | "a"..."f" | "A"..."F"

Have I misunderstood the documentation, or does Python not behave as advertised? (I'm using Python 2.7.)

Apalala
  • 9,017
  • 3
  • 30
  • 48
davidchambers
  • 23,918
  • 16
  • 76
  • 105
  • That certainly looks like a mistake. Also, `08` and `010` seem to be interpreted as indexes 8 and 10, rather than a syntax error and 8. If this isn't explained in the text, you should file a documentation bug. (However, I'd first check whether this is still wrong in 3.4, and search for existing bugs and discussions on the python-dev list.) – abarnert Oct 05 '13 at 22:20
  • 2
    From a quick look at the code, it looks like it's not using the parser on these fields (or the `format_spec`), but instead using a custom [`get_integer`](http://hg.python.org/cpython/file/2.7/Objects/stringlib/formatter.h#l65) function, which basically just processes `("0"..."9") +`. – abarnert Oct 05 '13 at 22:29
  • 1
    I've just confirmed that the behaviour is the same in Python 3.4.0a1. Time to scour the python-dev list. :) – davidchambers Oct 05 '13 at 22:41
  • After searching through bug reports, I don't think anyone has noticed this for `arg_name`—and, while they have noticed the exact same problem for `element_index`, no one suggested fixing this part of its grammar. – abarnert Oct 05 '13 at 23:02
  • http://bugs.python.org/issue19175 – davidchambers Oct 06 '13 at 04:16

1 Answers1

4

This looks like a mistake in the grammar. And the text has nothing to clarify it; it just describes it as "a number or an identifier" and talks about how it's interpreted if a number.

Testing it out, the field is clearly not treated as an integer:

>>> '{08}'.format(*range(10)) # should be SyntaxError
'8'
>>> '{010}'.format(*range(10)) # should be '8'
'10'
>>> '{-1}'.format(*range(10)) # should be '9', but looked up as a string
KeyError: '-1'
>>> '{1 }'.format(*range(10)) # should be '1', but looked up as a string
KeyError: '1 '
>>> '{10000000000000000000}'.format(1) # should be IndexError
ValueError: Too many decimal digits in format string

Looking at the code, it doesn't borrow from the Python parser to parse format strings; it uses custom parsing, and the code to interpret an arg_spec as a number uses a get_integer function that just converts each digit and shifts and adds until the field is over or we get within a digit of PY_SSIZE_T_MAX.

PEP 3101 suggests that this is intentional:

Simple field names are either names or numbers. If numbers, they must be valid base-10 integers …

It doesn't specifically say that it must not be too close to the maximum index value, nor that negative indices can't be used. But most of the other quirks could be explained by using the "valid base-10 integer" description instead of just "integer". In fact, just describing it as digit + instead of integer would solve all of the quirks.

The element_index is parsed in exactly the same way as the arg_name. #8985 say that element_index intentionally "… uses the narrowest possible definition for integer indexes, in order to pass all other strings to mappings." Whether that's also intentional for arg_name, or whether it's an unintended consequence of using the same code, I'm not sure.

The docs are unchanged in 3.4, and the code is effectively unchanged in the current trunk.

I'd suggest searching the bug tracker and the python-dev archives to see if this has been raised before. And, if not, figure out whether you think the docs or the code should be changed, file a bug, and ideally submit a patch.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • This [comment](http://bugs.python.org/issue8985#msg107705) by Eric Smith makes a good case for the current behaviour: "get_integer uses the narrowest possible definition for integer indexes, in order to pass all other strings to mappings." I'll file a documentation bug and look into making the fix myself. Thanks for the useful links, @abarnert. – davidchambers Oct 05 '13 at 22:56
  • @davidchambers: Yeah, I found the same comment and edited it into the answer. If you haven't read the edited version, I also found some text in PEP 3101 that seems relevant. Anyway, I think you're right that the evidence shows it's a documentation bug, not an implementation bug. – abarnert Oct 05 '13 at 23:04