This is flabbergasting and extremely frustrating, please help.
>>> a1 = '\xe5' # type <str>
>>> a2 = u'\xe5' # type <unicode>
>>> ord(a1)
229
>>> ord(a2)
229
>>> print a2.encode('utf-8')
å
>>> print a1.encode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128)
If a1 and a2 have the same value, why can't both be encoded?
I have to use an external API that returns unicode data on the a1
form, which makes it useless. Python apparently insists that <str>
typed strings must only contain ASCII chars or it refuses to encode them. It completely breaks my application.