How do I prevent a non-Unicode application to convert charset of resources when loading them on a differently localized machine?

Question

We have a non-Unicode, C++ application, written with Visual Studio, that has been originally written for machines using the codepage 1252 character set.

Our application performs many post-processing steps on the contents of the resources after reading them, including looking up for resource strings in some files.

Now people in China are starting to use the application, and their machines use the PRC locale (which sets the default codepage for non-unicode applications to 936, which is a multibyte character set).

It appears that CString::LoadString will perform some conversion. This breaks further processing because the content that we are looking for in the other files is not the same.

The same goes for CMenu::GetMenuString or CWnd::GetWindowText.

Badly enough, we cannot simply use iconv on our files because LoadString, GetMenuString or GetWindowText will behave this way:

some characters which are valid in codepage 1252 are not valid in codepage 936 (e.g. î, û, ñ, œ) and get replaced with question marks
some characters which are valid in codepage 1252 are not valid in codepage 936 (e.g. É) but get replaced with an alternate character (É => é)
some characters exist in both codepages but do not have the same representation, often with two bytes in CP936
some characters (including all ASCII characters) match in both codepages.

I would like that those three functions which load resource contents load the binary content, without performing any character set conversion. I have tried to modify the .rc file with LANGUAGE LANG_INVARIANT, SUBLANG_NEUTRAL but this did not change anything.

The resource file also includes a #pragma code_page(1252); can this be safely removed? What is that pragma for?

Thank you for your answers.

@David Hefernan: Because it will take us months. Our application is too complex for that (it interfaces with dozens of thirdparty products including Oracle) and we need to address that problem in less time than months. — Benoit, Mar 01 '11 at 12:20
Then just switch the resources to Unicode? There's no hard technical requirement to do a big-bang siwtch. — MSalters, Mar 01 '11 at 13:04

score 3 · Answer 1 · answered Mar 01 '11 at 15:47

3

Maybe you can use BOOL SetThreadLocale( LCID Locale );

MSDN : SetThreadLocale affects the selection of resources with a LANGUAGE statement. The statement affects such functions as CreateDialog, DialogBox, LoadMenu, LoadString, and FindResource. It sets the code page implied by CP_THREAD_ACP, but does not affect FindResourceEx. For more information, see Code Page Identifiers.

answered Mar 01 '11 at 15:47

engf-010

3,980
1
14
25

1

Calling "SetThreadLocale(LOCALE_SYSTEM_DEFAULT)" in my App's constructor did the trick. Apparently in VS2005 the default locale is "User Default". This is a change from VS6 where it was "System Default". "User Default" apparently ignores WindowsXP MUI settings, which is what this app relies on for language settings. – Ronny Sherer Dec 05 '16 at 15:23

DavidK · Accepted Answer · 2011-03-11T08:22:24.563

2

For LoadString, the obvious thing to do would be to call the Win32 API function LoadStringW() directly, which will give you the Unicode string directly. It might even work if you use the CStringW form of CString, like this (not tested!)

CStringW str;
str.LoadString(...);

The menu and window functions will give more problems. It should work to call the Unicode form of the Win32 API GetMenuStringW() directly. The window function GetWindowText() is the really awkward one: you can, of course, call the Win32 function GetWindowTextW(), but what that returns will depend on whether the window you call it on has an ANSI or Unicode window procedure. If the underlying window is a Windows control then it's usually possible to get at the underlying window procedure and call that directly, but it's not pretty and it's not much fun.

Any chance of more detail on how you're trying to use it? It's worth noting that you list these functions as if all 3 access resources, but that's not true: only LoadString() does that. The other two operate directly on the menu or window that exists in the running process, not on resources.

As an example of how it's possible to get around the GetWindowTextW() problems, have a look at the UnicodeEdit class from this project. This is an ANSI application that needed to work on Windows 9X, but also needed to be able to get Unicode text from an edit control if possible. The trick is that the class remembers whether the window procedure before subclassing was Unicode or ANSI, and if Unicode, calls that directly in its GetWindowText(). Depending on what you need, this sort of approach might help.

edited Mar 11 '11 at 08:22

answered Mar 01 '11 at 12:39

DavidK

3,929
1
19
26

Probably the resource is read when `CMenu::LoadMenu` is called? – Benoit Mar 01 '11 at 12:53
Yes. LoadMenu() will be where it accesses the resource, which makes sense. Hopefully calling GetMenuStringW() directly is the way to go here: if that works only GetWindowText() is left as a tricky case. – DavidK Mar 01 '11 at 12:55
Yes but I have the feeling that you forget my purpose: end up with a buffer containing the contents of my resource as they were stored originally, namely in codepage 1252. So, ::WidecharToMultiByte would then convert to cp936 on the wrong machines? Or maybe I misunderstood what you are suggesting. – Benoit Mar 01 '11 at 12:59
I was imagining that, armed with the Unicode form of the string, you'd call WideCharToMultiByte() with the first argument set to 1252, so that the result comes out in CP_1252: it's only if you call WideCharToMultiByte() with a first argument of CP_ACP that it will use the default code page. If that's not suitable you're left with the unpalatable choice of calling FindResource() and trying to decode the binary resource data - not fun. – DavidK Mar 01 '11 at 13:08
@DavidK: I have been stupid. I had not read that you can specify any code page to `WideCharToMultiByte`. Thanks. I will try that. – Benoit Mar 01 '11 at 13:28
Another interesting trick: `::WideCharToMultiByte` will ignore the codepage if you are running [AppLocale](http://www.microsoft.com/downloads/en/details.aspx?FamilyID=8c4e8e0d-45d1-4d9b-b7c0-8430c1ac89ab)… I wondered why even with 1252 as a parameter I still got it in CP936 in my debugger memory window. – Benoit Mar 01 '11 at 13:54
Interesting ... I hadn't come across AppLocale before. – DavidK Mar 01 '11 at 14:30
Your solution works great regarding `LoadStringW` and `GetMenuStringW` (which I replaced with `GetMenuItemInfoW` with a `MENUITEMINFOW` structure). However I am not sure I understand what you state about window procedures. As you could predict, `GetWindowTextW` is not working properly. Could you please expand a bit? Thanks! – Benoit Mar 01 '11 at 16:23
Apologies Beniot, I've only just noticed this comment... The problem with GetWindowTextW is that what you get back will depend on the window procedure of the window you're trying to get text from, in particular whether it's ANSI or Unicode. Even worse, if you subclass a Window with MFC that can change the procedure from a Unicode style one to an ANSI style one. If there is an ANSI style window procedure in there, you'll get mapping via the thread local charset, so you won't get the Unicode string back you hoped for. – DavidK Mar 11 '11 at 07:59
If all you're trying to do is to read strings from static controls in a dialog, you would be better off putting the strings into string resources and reading them directly with LoadStringW(). If the control you're trying to get text from is your custom code, it might be worth looking at adding your own message to get the appropriate Unicode string. I've edited my reply to add a final paragraph that links to a project I wrote to get round one version ofd this sort of problem. It all depends on what you're trying to do ... – DavidK Mar 11 '11 at 08:23
@DavidK: Thanks. Actually your answer learns me a lot but we have implemented another solution (only ASCII in resources). – Benoit Mar 11 '11 at 08:28
The simplest solution (i.e. only ASCII) is usually the best :-) – DavidK Mar 11 '11 at 08:31

How do I prevent a non-Unicode application to convert charset of resources when loading them on a differently localized machine?

2 Answers2