Convert ICU UnicodeString to platform dependent char * (or std::string)

Question

In my application I use ICU UnicodeString to store my strings. Since I use some libraries incompatible with ICU, I need to convert UnicodeString to its platform dependent representation.

Basicly what I need to do is reverse process form creating new UnicodeString object - new UnicodeString("string encoded in system locale").

I found out this topic - so I know it can be done with use of stringstream.

So my answer is, can it be done in some other simpler way, without using stringstream to convert?

There's a hidden assumption here, that there _is_ a "platform dependent 8 bits representation". That's already untrue on Windows, where 8 bits representations are reserved for legacy (Windows 95) applications. For that reason, there's no need to support UTF-8 there: 15 year old apps wouldn't expect Unicode, and more modern (NT) apps would use the native UTF-16. — MSalters, Dec 08 '10 at 11:17
@Donal: Your point? @MSalters: Plenty of Windows apps still need to consume UTF-8. For example, HTML/XML specs are defined in terms of it, as are many data formats. On-disk format is often UTF-8 even if the app uses UTF-16 internally. — Billy ONeal, Dec 08 '10 at 14:01
@Billy ONeal: Of course UTF-8 exists, even on Windows. But it's never the "platform dependent representation", or `CP_ACP` as it's known on Windows. — MSalters, Dec 08 '10 at 14:37
Karl Knechtel - it's not, that I don't want to use stringstream, I was rather curious, whether it's the only way ... — Trakhan, Dec 08 '10 at 14:46
others: I'm don't want to assume any encoding. I was saying UTF-8, because I'm currently developing on linux, where it is used. — Trakhan, Dec 08 '10 at 14:55
@Trakhan: Please specify whether you want to be platform-independent or not, because the answer depends on the platform, as MSalters has explained. On Windows, the conversion to the platform dependent representation is the identity transform—both ICU's UnicodeString and Windows use UTF-16 as their native representation. — Philipp, Dec 08 '10 at 15:30
Took Microsoft a long time but there is no a a native UTF-8 code page on Windows. Just commenting about a change 10 years after this question still comes up as #1 in google. — Lothar, May 24 '23 at 15:56

score 5 · Answer 1 · answered Feb 10 '15 at 10:15

5

i use

std::string converted;
us.toUTF8String(converted);

us is (ICU) UnicodeString

answered Feb 10 '15 at 10:15

CWTstackoverflow

83
1
5

score 3 · Accepted Answer · answered Dec 08 '10 at 17:27

3

You could use UnicodeString::extract() with a codepage (or a converter). Actually passing NULL for the codepage will use what ICU detected as the default codepage.

answered Dec 08 '10 at 17:27

Steven R. Loomis

4,228
28
39

Ahh, that's what I've been searching for. – Trakhan Dec 09 '10 at 18:05

score 0 · Answer 3 · answered Dec 08 '10 at 15:20

You could use the functions in ucnv.h -- namely void ucnv_fromUnicode (UConverter *converter, char **target, const char *targetLimit, const UChar **source, const UChar *sourceLimit, int32_t *offsets, UBool flush, UErrorCode *err). It's not a nice C++ API like UnicodeString, but it will work.

I'd recommend just sticking with the operator<< you're already using if at all possible. It's the standard way to handle lexical conversions (i.e. string to/from integers) in C++ in any case.

Convert ICU UnicodeString to platform dependent char * (or std::string)

3 Answers3