1

In my application I use ICU UnicodeString to store my strings. Since I use some libraries incompatible with ICU, I need to convert UnicodeString to its platform dependent representation.

Basicly what I need to do is reverse process form creating new UnicodeString object - new UnicodeString("string encoded in system locale").

I found out this topic - so I know it can be done with use of stringstream.

So my answer is, can it be done in some other simpler way, without using stringstream to convert?

Community
  • 1
  • 1
Trakhan
  • 463
  • 1
  • 6
  • 15
  • Why don't you want to use a stringstream? – Karl Knechtel Dec 08 '10 at 11:15
  • 2
    There's a hidden assumption here, that there _is_ a "platform dependent 8 bits representation". That's already untrue on Windows, where 8 bits representations are reserved for legacy (Windows 95) applications. For that reason, there's no need to support UTF-8 there: 15 year old apps wouldn't expect Unicode, and more modern (NT) apps would use the native UTF-16. – MSalters Dec 08 '10 at 11:17
  • A number of Unixes use UTF-8 for their string encoding. – Donal Fellows Dec 08 '10 at 11:32
  • 1
    @Donal: Your point? @MSalters: Plenty of Windows apps still need to consume UTF-8. For example, HTML/XML specs are defined in terms of it, as are many data formats. On-disk format is often UTF-8 even if the app uses UTF-16 internally. – Billy ONeal Dec 08 '10 at 14:01
  • 1
    @Billy ONeal: Of course UTF-8 exists, even on Windows. But it's never the "platform dependent representation", or `CP_ACP` as it's known on Windows. – MSalters Dec 08 '10 at 14:37
  • Karl Knechtel - it's not, that I don't want to use stringstream, I was rather curious, whether it's the only way ... – Trakhan Dec 08 '10 at 14:46
  • others: I'm don't want to assume any encoding. I was saying UTF-8, because I'm currently developing on linux, where it is used. – Trakhan Dec 08 '10 at 14:55
  • @Trakhan: Please specify whether you want to be platform-independent or not, because the answer depends on the platform, as MSalters has explained. On Windows, the conversion to the platform dependent representation is the identity transform—both ICU's UnicodeString and Windows use UTF-16 as their native representation. – Philipp Dec 08 '10 at 15:30
  • Took Microsoft a long time but there is no a a native UTF-8 code page on Windows. Just commenting about a change 10 years after this question still comes up as #1 in google. – Lothar May 24 '23 at 15:56

3 Answers3

5

i use

std::string converted;
us.toUTF8String(converted);

us is (ICU) UnicodeString

3

You could use UnicodeString::extract() with a codepage (or a converter). Actually passing NULL for the codepage will use what ICU detected as the default codepage.

Steven R. Loomis
  • 4,228
  • 28
  • 39
0

You could use the functions in ucnv.h -- namely void ucnv_fromUnicode (UConverter *converter, char **target, const char *targetLimit, const UChar **source, const UChar *sourceLimit, int32_t *offsets, UBool flush, UErrorCode *err). It's not a nice C++ API like UnicodeString, but it will work.

I'd recommend just sticking with the operator<< you're already using if at all possible. It's the standard way to handle lexical conversions (i.e. string to/from integers) in C++ in any case.

Billy ONeal
  • 104,103
  • 58
  • 317
  • 552