MFC CEdit converts non-ascii characters to ascii

Question

We have an MFC Windows Application, written originally in VC++ 6 and over the years updated for newer IDE, currently developed in VS2017.

The application is built with MBCS (not unicode). Trying to switch to Unicode causes 3806 compile errors, and that is probably just a tip of an iceberg.

However we want to be able to run the application with different code page, ie. 1250 (Central European).

I tried to build a small test application, and managed to get it to work with special characters (čćšđž). I did this by setting dialog font to Microsoft Sans Serif with code page 1250. The same approach in our application does not work. Note: dialogs in our application are created dynamically, and font is set using SetFont.

There is a difference how the special characters are treated in these two applications.

In test application, the special characters are displayed in the edit control, and GetWindowsText retrieves the right bytes. However, trying to write some characters from other languages, renders them as "????".
In our application, all special characters are rendered properly, but GetWindowText (or WM_GETTEXT) convert the special characters to the similar ascii counterpart (čćđ -> ccd).

I believe that Edit control in our application displays Unicode text, but GetWindowText converts it to ascii.

Does anyone have any idea what is happening here, and how I might solve it?

Note: I know how to convert project to Unicode. We are choosing not to commit resources to it at the moment, as it would probably take weeks or months to implement. The question is how I might get it to work with MBSC and why is edit control converting Č to C.

If this is your first attempt at code pages at all with your own windowing API, then perhaps you could consider utf-8 instead. Alternatively, it is worth persevering with the unicode build. Any error where you have used char just change to tchar. But code pages is a poor way to go, as you only support 1 language at a time, and all your ducks have to match in the right order. — Gem Taylor, May 22 '19 at 11:14
Converting the app to UNICODE is definitely the best option, but depending on your code this may be more or less cumbersome. First things you can do is replacing all string literals `"Abc"` to `_T("Abc")` and all `'X'` to `_T('X')`. This will probably already eliminate a lot of errors. Then get rid of all `char xx[yy]` and replace them with `CString`. Sometimes you may want to keep `char xx[yy]`, replace them by `TCHAR xx[yy]`. Also replace all `remaining `strlen`, `strcpy` etc. by `_tcslen`, `_tcscpy` etc. Replace `char` by `TCHAR`. Generally try to have as few raw char arrays as possible. — Jabberwocky, May 22 '19 at 12:02
The *Remarks* section for [IsWindowUnicode](https://learn.microsoft.com/en-us/windows/desktop/api/winuser/nf-winuser-iswindowunicode) probably explains, why you are seeing different results. — IInspectable, May 22 '19 at 21:20
@IInspectable This sounds interesting, I will call IsWindowUnicode on my code to check this. — Bojan Hrnkas, Jun 04 '19 at 09:33
Note: 3 years later we did convert to Unicode. The conversion took a couple hundred workhours, but it was the right solution. We never got the above approach to work. — Bojan Hrnkas, Jul 19 '22 at 16:41

score 2 · Answer 1 · answered May 22 '19 at 16:01

I believe it is absolutely possible to port the application to other languages/codepages, you only need to modify the .rc (resource) files, basically having one resource file for each language, which you may rather want to do anyway, as strings in menus and/or string-tables would be in a different language. And this is actually the only change needed, as far as the application part is concerned.

The other part is the system you are running it on. A window can be unicode or non-unicode. You can see this with the Spyxx utility, it tells you whether a window (procedure) is unicode or not (Window properties, General tab). And while unicode windows do work properly, non-unicode ones have to change encoding from/to unicode and mbcs when getting or setting the text. The conversion is based on the system (default) code-page. This can only be set globally (for the whole machine), and not per application or window. And of course, setting the font's codepage is not enough (and imo it's not needed at all, if you are runnign the application on a machine with the "correct" codepage). That is, for non-unicode applications, only one codepage will be working properly, the others won't.

I can see two options:

If you only need to update a small number of controls, it may be possible to change only these controls to unicode, and use the "wide" versions of the get/set window-test functions or messages - you will have to convert the text between unicode and your desired codepage. It requires writing some code, but has the advantage of the conversion being independent from the system default codepage, eg you can have the codepage in some configuration file, in the registry, or as a command-line option (in the application's shortcut). Some control types can be changed to unicode, some others not, so pls check the documentation. Used this technique successfully for a mbcs application displaying/editing translated strings in many different languages, but I only had one control, a List-View, which btw offers the LVM_SETUNICODEFORMAT message, thus allowing for unicode texts, even in a mbcs application.
The easiest method is simply run the application as is, but it will only be working on machines with the proper default codepage, as most non-unicode applications do.

The system default codepage can be changed by setting the "Language for non-Unicode programs" option, available in the regional settings, Administrative tab, and requires a reboot. Changing the Windows UI language will change this option as well, but by setting this option you don't need to change the UI language, eg you can have English UI and East-European codepage.

See a very similar post here.

score 1 · Answer 2 · answered Jul 19 '22 at 07:31

Late to the party:

In our application, all special characters are rendered properly, but GetWindowText (or WM_GETTEXT) convert the special characters to the similar ascii counterpart (čćđ -> ccd).

That sounds like the ES_OEMCONVERT flag has been set for the control:

Converts text entered in the edit control. The text is converted from the Windows character set to the OEM character set and then back to the Windows character set. This ensures proper character conversion when the application calls the CharToOem function to convert a Windows string in the edit control to OEM characters. This style is most useful for edit controls that contain file names that will be used on file systems that do not support Unicode.
To change this style after the control has been created, use SetWindowLong.

MFC CEdit converts non-ascii characters to ascii

2 Answers2

Linked