UnicodeEncodeError when formatting u'ES SIOUF_1' in Python 2

Question

I have this code:

"'{}'".format(u'ES SIOUF_1')

When run in Python 2, I receive the following error:

Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 2: ordinal not in range(128)

The same code run from Python 3, gives:

>>> "'ES\xa0SIOUF_1'"

I don't need neither. What I would need is:

>>> "'ES SIOUF_1'"

I read many questions regarding "encoding" and "decoding" characters in Python, and some differences to this regard between Python 2 and 3.

However, I sincerely don't understand them and I'd like to solve this problem for both version of Python possibly.

The thing I've noticed is that doing:

type(u'ES SIOUF_1')

gives:

>>> <type 'unicode'> # PYTHON 2
>>> <class 'str'> # PYTHON 3

Must have been a typo (an extra char between the braces), because it works for me (if i copy/paste your string). — CristiFati, Sep 25 '18 at 12:37
Well, actually, it is "'{}'".format(u'ES SIOUF_1') so it is correct. How could work for you and not for me? — umbe1987, Sep 25 '18 at 12:50
Ok, this is weird.... When I copy and paste the code I wrote here into my IDE (PyScripter), I don't receive the error. However, when I copy and paste from the command history the same code, I receive it. WinMerge shows there is a difference in the "white space" characther. But "to my eyes" they're really identical. What's going on???? Could it be that "that space" is of different type in the "copy and paste"??? — umbe1987, Sep 25 '18 at 13:10

score 1 · Accepted Answer · answered Sep 25 '18 at 13:21

You have fallen in a corner case trap. Unicode defines U+00A0 (u'\xa0' in Python notation) to be a NO-BREAK SPACE character. It prints exactly the same as a normal space (U+0020 or u'\x20') but is a distinct character and is not in the ASCII range.

For reasons I cannot guess (maybe a copy paste), you manage to get this no-break space in your unicode string, hence the weird printing in Python 3 and the inability to convert it to ascii in Python 2. As the format is a mere (byte) string in your Python 2 code, the unicode string is implicitely converted to ascii, which causes the exception. So in Python 2 you need to use a unicode format to get no error:

u"'{}'".format(u'ES SIOUF_1')

will work as it works in Python 3.

How to fix?

The correct way is to get rid of the offending u'\x20' before trying to process it. If you cannot, you can replace it explicitely with a normal space:

"'{}'".format(u'ES SIOUF_1'.replace(u'\xa0', u'\x20'))

should give what you want, both in Python 2 and Python 3

Wow, I am amazed!... Thank you. – umbe1987 Sep 25 '18 at 13:32 — umbe1987, Sep 25 '18 at 13:32

UnicodeEncodeError when formatting u'ES SIOUF_1' in Python 2

1 Answers1