Is there a difference between `%`-format operator and `str.format()` in python regarding unicode and utf-8 encoding?

Question

Assume that

n = u"Tübingen"
repr(n) # `T\xfcbingen` # Unicode
i = 1 # integer

The first of the following files throws

UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 82: ordinal not in range(128)

When I do n.encode('utf8') it works.

The second works flawless in both cases.

# Python File 1
#
#!/usr/bin/env python -B
# encoding: utf-8

print '{id}, {name}'.format(id=i, name=n)

# Python File 2
#
#!/usr/bin/env python -B
# encoding: utf-8

print '%i, %s'% (i, n)

Since in the documentation it is encouraged to use format() instead of the % format operator, I don't understand why format() seems more "handicaped". Does format() only work with utf8-strings?

When you did `u'{id}, {name}'.format(id=i, name=n)` what did you observe? Note that the formatting string is a Unicode string `u'...'`. Please add that to your examples and comment on it. — S.Lott, Dec 22 '11 at 11:40
Thank you S.Lott, this was it. I understand now where my fault was. `'{id}, {name}'` was a utf-8 string (defined by the *magic line* `# encoding: utf-8`) and `n` was in unicode. It is not possible to "concatenate" them. That is why `n.encode('utf8')` worked. Right? — Aufwind, Dec 22 '11 at 11:44

score 10 · Accepted Answer · answered Dec 22 '11 at 11:41

10

You're using string.format while you don't have a string but an unicode object.

print u'{id}, {name}'.format(id=i, name=n)

will work, since it uses unicode.format instead.

answered Dec 22 '11 at 11:41

Tom van der Woerdt

29,532
7
72
105

Is there a difference between `%`-format operator and `str.format()` in python regarding unicode and utf-8 encoding?

1 Answers1

Linked