0

I have a string table which defines a string in Chinese like this:

STRINGTABLE
    LANGAUGE 0x0C04, 0x03
BEGIN
    1000    "检查环境..."
    ...
END

I am trying to load that string into a wchar_t buffer as follows:

#define UNICODE
#define _UNICODE
wchar_t buffer[512];
LoadString(DLL_HANDLE, (UINT) msg_num, buffer, 512);
MessageBox(NULL, buffer, NULL, NULL);

However, the string that is loaded into the buffer is different than the one that is in my string table.

It looks like this in my string table:

检查环境...

But this is how it turns out on screen:

環境をãƒã‚§ãƒƒã‚¯ä¸­...
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
user1202422
  • 558
  • 2
  • 8
  • 16
  • 1
    Are you sure your string table is widechar/UTF-16 and non multibyte? – sree Apr 10 '13 at 21:39
  • 2
    Make sure your string table uses `L"检查环境"` – Jesse Good Apr 10 '13 at 21:43
  • @JesseGood: resource files (including the string table) do not need the `L` on their strings. – Nate Hekman Apr 10 '13 at 21:50
  • 1
    I have a Chinese resource file and at the top it specifies `LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED` and `#pragma code_page(936)`. I don't know if those would affect the problem you're seeing, but worth checking. – Nate Hekman Apr 10 '13 at 21:52
  • 1
    How are you printing it to the screen? Have you checked in the debugger what the actual contents of your buffer are? – Jonathan Potter Apr 10 '13 at 22:19
  • 1
    @NateHekman: Are you sure? [According to here](http://msdn.microsoft.com/en-us/library/windows/desktop/aa381050%28v=vs.85%29.aspx): `To encode Unicode characters, use an "L" followed by the Unicode characters.`. – Jesse Good Apr 10 '13 at 22:25
  • @JesseGood: Hmm, the docs are on your side! All I can say is our Chinese resources do not have the `L` and yet they work. – Nate Hekman Apr 10 '13 at 22:28
  • @NateHekman: You are probably using a Unicode-enabled editor, as [mentioned here](http://msdn.microsoft.com/en-us/library/cc194805.aspx): `The Win32 resource compiler can process files encoded in Unicode, but you would need to create such a file using a Unicode-enabled editor.`. – Jesse Good Apr 10 '13 at 22:47
  • @JonathanPotter I have checked the contents of my buffer inside the VS debugger, and they are the exact same as what is printed to the MessageBox. – user1202422 Apr 11 '13 at 00:01
  • @sree didn't know we could specify a type of string table. Right now, I am just using the standard way of creating a string table (updated in question) – user1202422 Apr 11 '13 at 00:04
  • What encoding is your .rc file stored in? I'm not sure that VS supports UTF-8, you may need to save it as a UTF-16 file. You can check the encoding using a text editor like Notepad++. – Jonathan Potter Apr 11 '13 at 00:04
  • @JonathanPotter its encoded in UTF8 without BOM according to notepad++, but there is no option to save as UTF16 – user1202422 Apr 11 '13 at 00:17
  • @JesseGood - tried that, doesn't change things unfortunately. – user1202422 Apr 11 '13 at 00:24
  • 1
    The option you would want is UCS-2 Little Endian. – Jonathan Potter Apr 11 '13 at 00:26

2 Answers2

0

Doesnt the 'MessageBox' function work on narrow strings by deault? Wouldn't you need to use 'MessageBoxW'?

Edit:

A couple of things to check. The encoding of L"..." strings is implementation defined. The standard makes no mention of encoding of characters of wchar_t; make sure you're using the same encoding as windows expects. (If I recall correctly, windows expects UTF-16 - but I very well may be wrong on this).

In C++11, 3 new literal string types are introduced, and their prefixes are "u8", "u" and "U", which specify UTF-8, UTF-16 & UTF-32, respectively. C++11 still makes no guarantees on the encoding the "L" prefixes from what I can tell, other than what is mentioned in §2.14.3:

A character literal that begins with the letter L, such as L’x’, is a wide-character literal. A wide-character
literal has type wchar_t.23 The value of a wide-character literal containing a single c-char has value equal
to the numerical value of the encoding of the c-char in the execution wide-character set, unless the c-char
has no representation in the execution wide-character set, in which case the value is implementation-defined.
[ Note: The type wchar_t is able to represent all members of the execution wide-character set (see 3.9.1).
—end note ]. The value of a wide-character literal containing multiple c-chars is implementation-defined.

Reference §3.9.1 P5 states:

Type wchar_t is a distinct type whose values can represent distinct codes for all members of the largest
extended character set specified among the supported locales (22.3.1). Type wchar_t shall have the same
size, signedness, and alignment requirements (3.11) as one of the other integral types, called its underlying
type. Types char16_t and char32_t denote distinct types with the same size, signedness, and alignment as
uint_least16_t and uint_least32_t, respectively, in <stdint.h>, called the underlying types.

Again, no mention of encoding. It is possible that windows is expecting a different encoding that what your resource string is using, and thus the discrepancy.

You might verify by calling MessageBox using an L"" string literal with "\Uxxxxxxx" encoding escapes for your characters to verify.

Nathan Ernst
  • 4,540
  • 25
  • 38
  • `MessageBox` is a macro; when `UNICODE` is defined, `MessageBox` gets defined as `MessageBoxW`. Ditto for `LoadString`. The fact that it compiles when he's passing it a `wchar_t*` shows that that's not the problem. – Nate Hekman Apr 10 '13 at 23:51
  • Yes, I realised that after I hit submit and that I was posting an answer rather than a comment. (I was on a mobile device at the time). – Nathan Ernst Apr 11 '13 at 00:01
  • The string that is being outputted through the message box is the same as the value of the buffer in the debugger. – user1202422 Apr 11 '13 at 00:23
  • @user1202422: I'm curious: have you looked at a hexadecimal dump of the memory the string is pointing to, rather than the debugger visualizer, and compared that to what's expected in UTF-32? I'm still leaning towards an encoding issue from the info I've seen. From http://msdn.microsoft.com/en-us/library/dd374081.aspx, it seems my hypothesis was correct in that wchar_t is a UTF-16 encoded string under windows. – Nathan Ernst Apr 11 '13 at 01:06
  • 2
    Um, the C++ standard doesn't apply to the resource compiler. – Raymond Chen Apr 11 '13 at 01:22
  • @RaymondChen, I understand that. I'm curious what the binary representation of the `wchar_t` being sent to `MessageBox` is at this point. It still seems to me like an encoding problem. You are, of course, the Win32 Guru, so feel free to step in and correct me. – Nathan Ernst Apr 11 '13 at 02:03
0

The MSDN documentation states that the format should be similar to IDS_CHINESESTRING L"\x5e2e\x52a9". That's not the most formal of descriptions. I interpret it as stating that unicode strings must be prefixed with L and encoded using \uxxxx escape codes

MSalters
  • 173,980
  • 10
  • 155
  • 350