0

For example I need codepoint of 5th character here, that is ð

const WCHAR* mystring = L"Þátíð";

I know that it has code point : U+00F0 - but how to get this integer using c++ ?

Steven Carlson
  • 925
  • 1
  • 10
  • 25
rsk82
  • 28,217
  • 50
  • 150
  • 240
  • 1
    wchar_t is a compiler specific type, WCHAR indicates Windows (wchar_t==WCHAR==UCS2/UTF16LE) If you want portability you need to change WCHAR to wchar_t. If this is Windows only you should tag it with a windows tag... – Anders Jun 06 '12 at 11:46

2 Answers2

2

WCHAR in Windows 2000 and later is UTF-16LE so it is not necessarily safe to access a specific character in a string by index. You should use something like CharNext to walk the string to get correct handling of surrogate pairs and combining characters/diacritics.

In this specific example Forgottn's answer depends on the compiler emitting precomposed versions of the á and í characters... (This is probably true for most Windows compilers, porting to Mac OS is probably problematic)

Anders
  • 97,548
  • 12
  • 110
  • 164
1
const WCHAR myString[] = L"Þátíð";
size_t myStringLength = 0;
if(SUCCEEDED(StringCchLengthW(myString, STRSAFE_MAX_CCH, &myStringLength))
{
    LPCWSTR myStringIterator = myString;
    for(size_t sz = 0; sz < myStringLength; ++sz)
    {
        unsigned int mySuperSecretUnicodeCharacter = *myStringIterator;
        LPCWSTR myNextIterator = CharNext(myStringIterator);
        std::vector<unsigned int> diacriticsOfMySuperSecretUnicodeCharacter(myStringIterator+1, myNextIterator);
        myStringIterator = myNextIterator;
    }
}

Edit 1: made it actually work

Edit 2: made it actually look for all codepoints

Forgottn
  • 563
  • 3
  • 11
  • Okay, I give up... This: U+0061 U+0301 U+0302 U+0303 U+0304 (a´^~¯) cannot be hold by any "normal" type. Assuming that you can never know the real code point. – Forgottn Jun 06 '12 at 11:47
  • A Unicode code point (a value in the Unicode codespace) will fit in 32bit but what a user calls a "character" might not fit... – Anders Jun 06 '12 at 11:53