0

I'm trying to decode a byte string after detecting the type of encoding it has gone through.

The string is:

\xa1\xb6\xb0\xb5\xd2\xb9\xa1\xb7\x00\x02\x00`\xeb\x03h\x10\xeb\x03\x03\x12+\x00\n\xfe\n\x03

The detected encoding is big5, when I try to decode that using string.decode('big5') it throws the below error:

UnicodeDecodeError: 'big5' codec can't decode bytes in position 12-13: illegal multibyte sequence

Need some help as to how to solve this issue?

sjakobi
  • 3,546
  • 1
  • 25
  • 43
SilentFlame
  • 487
  • 5
  • 15
  • The _detected_ encoding? How did you detect it? The error message says that it can not decode the bytes at position 12, 13, which are `\xeb\x03`. This makes sense: The big5 encoding stores byte pairs. The first byte in such a pair can take values in the ranges [0xa1, 0xc6] and [0xc9, 0xf9]. The second byte can take values in the ranges [0x40, 0x7e], [0xa1, 0xfe]. The second byte of your pair is 0x03, which is clearly not in the allowed range. Edit: Mh, some more of your pairs are out of range, maybe I need to check that again. – pschill Jun 07 '18 at 07:07
  • Where does the string come from? Decoding succeeds if I use just the first 12 bytes, but I can't judge if the characters make sense (a triangle and two Chinese characters). – Arndt Jonasson Jun 07 '18 at 07:08
  • Some bytes in your string are null (`\x00`). Maybe the string is null-terminated? Then the string would only consist of the first 8 bytes, which can be decoded successfully. – pschill Jun 07 '18 at 07:14

0 Answers0