1

Is there a way to hook a .decode call into the format specification? There may be reasons i.e. a large buffer, not to decode the everything, and it can be inconvenient to call decode on every single argument.

In [473]: print(b'hello world' + b', John')
b'hello world, John'

But:

In [475]: print('{}, {}'.format(b'hello world', b'John'))
b'hello world', b'John'

The format string is still a string literal merely with a 'b' included so:

In [477]: print('{}, {}'.format(b'hello world', b'John').encode())
b"b'hello world', b'John'"

Edit, Something like this is also possible, but blindly looping try-excepts is rather bad:

def decoder_step(s): 
    try: return s.decode()
    except: return s
decoder = lambda x: tuple(decoder_step(s) for s in x)

In [3]: "{} {} {}".format(*decoder([b'foo', 3, b'bar', 'man']))
Out[3]: 'foo 3 bar'
user3467349
  • 3,043
  • 4
  • 34
  • 61

2 Answers2

1

The behavior you are looking for does not currently exist in Python. There is no way to use a unicode format string and insert bytes objects into it so that they will be automatically decoded. This is a design decision, as automatic decoding is often a source of bugs (what should the code do if the bytes cannot be decoded with the default encoding?). If you want to insert encoded text into a Unicode string, decode it properly first!

However, thanks to PEP 461, Python 3.5 will allow bytes objects to use the older style of text formatting with the % operator. So, b"%s %s" % (b"Hello", b"World") will work, creating a new bytes object. This functionality is intended mostly for implementing protocols like HTTP and SMTP that are specified as using ASCII text for their commands and responses. If you're dealing with user data rather than human readable binary protocols, you shouldn't be doing any string formatting with bytes objects, but rather using Unicode everywhere except at the bare metal (and even there, Python's IO code can often handle the encoding and decoding for you).

Blckknght
  • 100,903
  • 11
  • 120
  • 169
0

Well there is

In [1]: print(bytes('{}, {}'.format('hello world', 'John'),'utf-8'))
b'hello world, John'

Is that what You need, or You wanted something only within format?

brainovergrow
  • 458
  • 4
  • 13
  • That doesn't work if `b'hello world'` and `b'john'` - I was looking for something with format like non-existent-specifier `{B}` which calls .decode() or perhaps a way to hook a function into the format specification. There doesn't seem to be a simpler alternative to calling `.decode()` on all your byte objects before passing them to format though. – user3467349 Mar 01 '15 at 22:42
  • I see now. If You really need it though, there is always possibility of [subclassing string.Formatter](http://stackoverflow.com/questions/21664318/subclass-string-formatter). – brainovergrow Mar 02 '15 at 04:36