1

I am reading unicode string from an XML file in C++ program and when i print the string i get the unicode printed as plain text instead of the decoded locale string.I initially tried it with plain string and then used widestring for decoding .Here is the code snippet i used

std::wstring wide_string = std::wstring_convert<std::codecvt_utf8<wchar_t>>().from_bytes(plainString).value());

printf("\n[%ls]|[logs]|[info]: wide string...", wide_string);

This is my unicode string : \u30b3\u30f3\u30dd\u30fc\u30cd\u30f3\u30c8\u30a4\u30f3\u30b9\u30c8\u30fc\u30e9\u30fc Expected output : コンポーネントインストーラー

But am getting the unicode strings printed as they are. Any help would be greatly appreciated.

user1
  • 53
  • 9
  • 1
    You should show the string you try to convert and the expected and actual outputs to help others to reproduce. – Serge Ballesta Feb 26 '20 at 09:13
  • Could you dump the hex values of the first characters of the string (`for(int i=1; i<10; i++) { printf(" %04x", (unsigned int) wide_string[i]); }`? I suspect the `\u` to be actual string characters... – Serge Ballesta Feb 26 '20 at 09:40
  • you need to use `wide_string.c_str()` to print because `%ls` expects a `wchar_t*`, not `std::wstring` – phuclv Feb 26 '20 at 12:16
  • It would also help to know what platform you're using, since it's about how characters display in a console. For instance, Unicode in the Windows command prompt is... unique https://ss64.com/nt/chcp.html – parktomatomi Feb 26 '20 at 14:49
  • yes am using this is a windows platform – user1 Feb 26 '20 at 15:04
  • The source string is encoded using `\uXXXX` char sequences, ie `\u30b3` is literally the 6 chars `'\'` `'u'` `'3'` `'0'` `'b'` `'3'` in the string. You need to manually decode those sequences, ie replace the 6-char substring `\u30b3` with a single char of numeric value `0x30b3`, before then passing the result to `printf()` (why not `std::wcout`?), which will not do that decoding for you. I wonder why the XML contains these sequences to begin with, instead of using standard XML entity/character references, ie `b3;`? If it had, your XML library would have decoded them for you. – Remy Lebeau Feb 26 '20 at 22:34

0 Answers0