In both cp1250
and latin2
, there is no character corresponding to the byte \x88
(cf. gray cells in the code page tables). Yet, if I try to decode this byte using the two encodings in Python 3, I get different results. The first encoding yields an error:
>>> b"\x88".decode("cp1250")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.6/encodings/cp1250.py", line 15, in decode
return codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x88 in position 0: character maps to <undefined>
Which makes sense, since there really is no character defined for that byte. However, the second encoding returns a character corresponding to Unicode codepoint \u0088
, even though it shouldn't be defined either:
>>> b"\x88".decode("latin2")
'\x88'
>>> "\x88" == "\u0088"
True
Why is that?