0

I have the following string encoded with ISO-8859-15 stored inside of a file:

DEBUG_RECEIVED: ????

The correct UTF-8 string though is:

DEBUG_RECEIVED: 测试手机

Does it make sense trying to convert those wrong ???? characters again into 测试手机 (therefore from ISO-8859-15 to UTF-8 again), or is it just impossible due to the fact that ISO-8859-15 is not sued for chinese characters and as it uses 8 bit per character, the 16 bit needed for chinese characters are simply lost?

When I try the following:

echo "DEBUG_RECEIVED: ????" | iconv -f iso-8859-15 -t utf-8

I still get DEBUG_RECEIVED: ???? as output.

I am a bit confused about this, please, if you can clarify this detail, it would be great.

Thanks for the attention.

tonix
  • 6,671
  • 13
  • 75
  • 136
  • 2
    The `?` might actually just be `?`, so, there might not be any information left of the original string. – Marcus Müller Jul 10 '16 at 20:56
  • 3
    It's not possible to represent Chinese characters directly in ISO-8859-15. – Keith Thompson Jul 10 '16 at 21:00
  • 1
    @tonix: Keith is right. ISO-8859-15 is meant for Western European languages, with few differences from ISO-8859-1. Chinese is encoded using GB2312, GB18030 or Big5 instead, if not one of the UTFs. – Remy Lebeau Jul 12 '16 at 21:50

1 Answers1

1

Yes, whatever generated the 8859-15 string had to discard the information necessary to represent Chinese characters.

Lost info is lost – your Chinese characters seem to have been replaced by ?, and there is nothing that can get them back.

Marcus Müller
  • 34,677
  • 4
  • 53
  • 94