Encoding: does it make sense trying to decode an ISO-8859-15 string with chinese characters into UTF-8 string?

Question

I have the following string encoded with ISO-8859-15 stored inside of a file:

DEBUG_RECEIVED: ????

The correct UTF-8 string though is:

DEBUG_RECEIVED: 测试手机

Does it make sense trying to convert those wrong ???? characters again into 测试手机 (therefore from ISO-8859-15 to UTF-8 again), or is it just impossible due to the fact that ISO-8859-15 is not sued for chinese characters and as it uses 8 bit per character, the 16 bit needed for chinese characters are simply lost?

When I try the following:

echo "DEBUG_RECEIVED: ????" | iconv -f iso-8859-15 -t utf-8

I still get DEBUG_RECEIVED: ???? as output.

I am a bit confused about this, please, if you can clarify this detail, it would be great.

Thanks for the attention.

The `?` might actually just be `?`, so, there might not be any information left of the original string. — Marcus Müller, Jul 10 '16 at 20:56
It's not possible to represent Chinese characters directly in ISO-8859-15. — Keith Thompson, Jul 10 '16 at 21:00
@tonix: Keith is right. ISO-8859-15 is meant for Western European languages, with few differences from ISO-8859-1. Chinese is encoded using GB2312, GB18030 or Big5 instead, if not one of the UTFs. — Remy Lebeau, Jul 12 '16 at 21:50

score 1 · Accepted Answer · answered Jul 10 '16 at 20:58

1

Yes, whatever generated the 8859-15 string had to discard the information necessary to represent Chinese characters.

Lost info is lost – your Chinese characters seem to have been replaced by ?, and there is nothing that can get them back.

answered Jul 10 '16 at 20:58

Marcus Müller

34,677
4
53
94

Got it. Thank you for your clarification! – tonix Jul 10 '16 at 21:34

Encoding: does it make sense trying to decode an ISO-8859-15 string with chinese characters into UTF-8 string?

1 Answers1