2

In my server application I am trying to parse my responses with UTF-8 in Greek but since my local char set is 1254 thus I can not reach every Greek character.

I tried setting my threads local as 1253 but it did not work. I want to know if there is a way to convert UTF-8 string to windows.1253(Greek) in my 1254 char set machine just for certain client responses.

By the way when I change my regional setting to Greek I do not have any problem but I can not follow this solution cause my local setting should remain as windows.

Update based on comments:

This is the response I get in UTF-8

"Valuation_Date":"10\/10\/2019 12:00:00 πμ"

This is how my application gets it

"Valuation_Date":"10\/10\/2019 12:00:00 πμ"

Code that this string gets through after its Unicode wstring

std::string returnGivenCodePage(const std::wstring &unicodeString)
{
    std::string result;
    int numberOfBytesNeeded = WideCharToMultiByte(1253, WC_NO_BEST_FIT_CHARS,
            unicodeString.c_str(), (int)unicodeString.length(),
            NULL, 0, NULL, NULL);
    int numberOfBytesWritten = WideCharToMultiByte(1253, WC_NO_BEST_FIT_CHARS,
                unicodeString.c_str(), (int)unicodeString.length(),
                &result[0], numberOfBytesNeeded, NULL, NULL);
    return result;
}

and finnaly this is the version after I change it to SystemWindowsAnsi which is 1253(Greek Locale) but my default local is 1254(Turkish)

<ValuationDate>10/10/2019 12:00:00 ğì</ValuationDate>

And of course this is just a small part of really big response.

Actually what I want is converting UTF-8 string to windows 1253(Greek) and after processing on it again converting UTF-8 string my current default local is 1254(Turkish).

If you need further information I'll glad to share some more.

cangermi
  • 29
  • 2
  • Windows uses Unicode for strings so you only need to specify a codepage when reading *non*Unicode input or writing to *non*Unicode output. If you use Unicode strings and types in your application (std::u16string and char16_t) you'll only have to worry about conversions during input/output. The compiler itself will complain if you mix up single-byte and multibyte strings – Panagiotis Kanavos Oct 08 '19 at 13:25
  • C and C++ have no special type for UTF8 though - you'll have to use `std::string` or `char` and make ensure all of your code treats characters as UTF8 bytes. You'll have to convert strings from UTF8 to another codepage and vice versa in export/import – Panagiotis Kanavos Oct 08 '19 at 13:29
  • @PanagiotisKanavos "Windows uses Unicode for strings" This is a rather bold statement. – n. m. could be an AI Oct 10 '19 at 10:23
  • Code pages 1253 and 1254 have nothing to do with UTF-8. If you have UTF-8 data in an external file, you probably want to use code page 65001 (CP_UTF8) perhaps passing it to MultiByteToWideChar/WideCharToMultiBare functions. – n. m. could be an AI Oct 10 '19 at 10:28
  • @n.m. It's a fact since the first Windows NT version. We can argue about the Windows 95 line, but almost every Windows machine now runs a version of the Windows NT line and hence, uses Unicode – Panagiotis Kanavos Oct 10 '19 at 10:30
  • @PanagiotisKanavos I'm not sure what you mean by "ASCII functions", if that's A functions then A stands for "ANSI" and is a misnomer. These functions are not relevant to the question anyway. OP has data which is not in Windows brain-dead internal encoding, so those 'W' functions are of no help to them. – n. m. could be an AI Oct 10 '19 at 10:40
  • Do you need to parse Greek words? Please post some code *and data* which don't work – n. m. could be an AI Oct 10 '19 at 12:15
  • I have deleted my answer since I realize that I do not understand what you are asking and the answer was about something else. I would suggest you provide runnable code that illustrates the problem by creating undesired output while also explaining what kind of output you want instead. – Johnny Johansson Oct 11 '19 at 10:08
  • "This is how my application gets it" This is a meaningless statement. Your application gets a sequence of bytes. What you show is not a sequence of bytes, but a sequence of Unicode characters. In order to show this sequence, the original bytes must have been transcoded somehow. The process of transcoding is unknown and potentially wrong. If you want to show what bytes your application got, you need to present them in a portable format not subject to transcoding errors. Printing them in hexadecimal is fine. (to be continued) – n. m. could be an AI Oct 11 '19 at 17:16
  • I suspect it just gets Greek characters encoded as UTF-8, but who knows. If you want to say "This is how my applicationt **prints** it to the console", then yes, this is a believable and likely correct statement. If you want your application to **print** strings like `πμ`, then you need to convert from UTF-8 to `wchar_t` (which uses UTF-16) and print that. Locales or Greek or Turkish Code pages are irrelevant to this process. The only code page you need to worry is 65001 aka CP_UTF8. To reiterate, you should **NOT** ever mention 1253. 1254 or any other code page except 65001 in your program. – n. m. could be an AI Oct 11 '19 at 17:21

0 Answers0