How can we find if a character belongs to a particular codepage? or How can we determine whether a charcter fits into currently active IME for an application.
-
1You need to define 'character'. Do you mean you have a UTF-16 or UTF-8 multibyte character, and you want to know if that translates to a point in a given Windows code page? – richb Mar 10 '10 at 12:18
-
yes that is right, the character could be UTF-8 character and I need to find out if it translates to a codepoint in a given windows codepage. – Prakash Mar 10 '10 at 12:31
4 Answers
- First, Convert your UTF-8 string of characters to UTF-16 using MultiByteToWideChar
- Now, reverse the process using WideCharToMultiByte passing the desired codepage as the first parameter.
Use the WC_ERR_INVALID_CHARS
flag and WideCharToMultiByte will fail outright if any invalid characters are used. If you want to know which characters are not represented in the target codepage, use the lpDefaultChar, and lpUsedDefaultChar parameters.
LPCWSTR pszUtf16; // converted from utf8 source character
UINT nTargetCP = CP_ACP;
BOOL fBadCharacter = FALSE;
if(WideCharToMultiByte(nTargetCP,WC_NO_BEST_FIT_CHARS,pszUtf16,NULL,0,NULL,&fBadCharacter)
{
if(fBadCharacter)
{
// at least one character in the string was not represented in nTargetCP
}
}

- 34,244
- 12
- 79
- 148
The two previous answers have correctly suggested using MultiByteToWideChar then WideCharToMultiByte to translate your UTF-8 character to UTF-16, then to the current Windows codepage (CP_ACP). Check the result of WideCharToMultiByte to see if the conversion was successful.
What wasn't clear from the original question, is that you are having a particular issue with Hindi. For this language, your question is meaningless because there is no Windows ANSI codepage for Hindi, as Chris Becke pointed out. Therefore, you can never convert a Hindi character to CP_ACP, and WideCharToMultiByte will always fail.
To use Hindi on Windows, as far as I understand it, you must be a Unicode app that calls Unicode APIs.

- 4,716
- 5
- 24
- 22
Using the windows functions WideCharToMultiByte and MultiByteToWideChar you can convert between UTF-8 and 16-bit Unicode characters. The functions have arguments to specify the code page and to specify the behavior if an invalid character is encountered.

- 23,217
- 12
- 67
- 130
-
Thanks , Yes you are right , i was using LPBOOL lpUsedDefaultChar parameter of the WideCharToMultiByte() to determine the same , however for Hindi IME that has code page 0 , the result lpUsedDefaultChar is always true. [Not sure how my previous comment got removed :( but I had mentioned it in details there] – Prakash Mar 10 '10 at 13:44
Thanks Chris..I am running the following code
#define CP_HINDI 0
#define CP_JAPANESE 932
#define CP_ENGLISH 1252
wchar_t wcsStringJapanese = 'あ';
wchar_t wcsStringHindi = 'र';
wchar_t wcsStringEnglish = 'A';
int main()
{
BOOL usedDefaultCharacter = FALSE;
/* Test for ENGLISH */
WideCharToMultiByte( CP_ENGLISH,
0, &wcsStringEnglish,
-1,
NULL,
0,
NULL,
&usedDefaultCharacter);
printf("usedDefaultCharacters for English? %d \n",usedDefaultCharacter);
usedDefaultCharacter = FALSE;
/*TEST FOR JAPANESE */
WideCharToMultiByte( CP_JAPANESE,
0,
&wcsStringJapanese,
-1,
NULL,
0,
NULL,
&usedDefaultCharacter);
printf("usedDefaultCharacters for Japanese? %d \n",usedDefaultCharacter);
//TEST FOR HINDI
usedDefaultCharacter = FALSE;
WideCharToMultiByte( CP_HINDI,
0,
&wcsStringHindi,
-1,
NULL,
0,
NULL,
&usedDefaultCharacter);
printf("usedDefaultCharacters for Hindi? %d \n",usedDefaultCharacter);
}
The above code returns:
usedDefaultCharacters for English? 0
usedDefaultCharacters for Japanese? 0
usedDefaultCharacters for Hindi? 1
The third line is incorrect as the Codepage for Hindi is 0 , and the string passed consists of Hindi Character and still the usedDefaultChar is set to 1 .. which should not be the case.

- 27,478
- 10
- 60
- 79

- 742
- 7
- 19
-
The codepage for hindi is NOT zero. Hindi is one of the new 'unicode only' localizations. There is no actual windows ansi codepage for representing hindi characters. Refer to this page: http://msdn.microsoft.com/en-us/goglobal/bb688174.aspx – Chris Becke Mar 10 '10 at 15:18
-
so is there any value that I can give for the "codepage" parameter of WideCharToMultiByte to find out if the current encoding supports the Hindi Character? Or is there a way (in c++) to find out what if the current encoding of the page is UNICODE? -Thanks – Prakash Mar 10 '10 at 17:03