Why WCHAR related code processing is needed

Question

I came across an application code and see that for the "Windows" specific code they have converted the command line arguments to do some additional WCHAR related processing (between #ifdef _WIN32).

What would be the reason for adding WCHAR processing related code for Windows in below snippet? I am trying to understand the rationale behind invoking the WCHAR related functions. In the end, the coder is trying to call myFunc which could have directly been called using normal argC and argV. Why would someone feel to need to add additional processing of wchar?

int main(int argc, char* argv[])
{

#ifdef _WIN32

    int argc_w;
    LPWSTR* argv_w = CommandLineToArgvW(GetCommandLineW(), &argc_w);
    std::vector<char*> argv_vector;
    int result;
    if (ConvertToUtf8(argc_w, argv_w, argv_vector)) {
        result = myFunc(argc_w, argv_vector.data());
    } else {
        result = myFunc(argc, argv);
    }

    // code to free vector
    return result;
#else
    int (*ptrMyFunc)(int, char**, const char*);

    void *dllHandle = dlopen(myDLL.c_str(), RTLD_LAZY);
    *(void **) (&ptrMyFunc) = dlsym(dllHandle, "myFunc");
    return (*ptrMyFunc)(argc, argv);
#endif

}

score 0 · Answer 1 · answered Jun 12 '20 at 12:19

0

I think there are several reason, some are also wrong.

The myFunc function (I'll not comment the bad choice of the name) seems to require UTF-8 arguments. Now on MacOS and Linux you can expect it, but not on Windows. I think this is an error: one should not UTF-8, instead the program should check out the command line encoding.

But Windows adopted early Unicode, before UTF-8 became a thing (and possibly before Unicode discovered that it cannot fulfill initial goal: all characters in 16bit). So Windows uses internally UCS-2 or UTF-16, and externally there is ANSI, etc. but not really a standard (it depends on Language of Windows). The Windows function CommandLineToArgvW parse the command line into Unicode string, and it is the way to get command line, if you have non-ASCII characters (really, one should use wmain. I'm not an expert in Windows API, but the above code smell. It seems some copy paste to solve few existing problems (e.g. command lines with non-ASCII chars, only modern MacOS/Linux) without understanding the problem and getting a clean solution.

answered Jun 12 '20 at 12:19

Giacomo Catenazzi

8,519
2
24
32

Giacomo Catenazzi - I lost you in these lines. But Windows adopted early Unicode, before UTF-8 became a thing (and possibly before Unicode discovered that it cannot fulfill initial goal: all characters in 16bit). So Windows uses internally UCS-2 or UTF-16, and externally there is ANSI, etc. but not really a standard (it depends on Language of Windows). – Jun 12 '20 at 15:27
`Windows uses internally UCS-2 or UTF-16, and externally there is ANSI, etc` Not sure what you *really* meant by this, but it is plain wrong as written. From Microsoft's [Unicode - Win32 apps](https://learn.microsoft.com/en-us/windows/win32/intl/unicode): "*The system uses Unicode exclusively for character and string manipulation*". The same page also covers UTF-8, and the legacy 8-bit codepages that are still supported for backwards compatibility. – dxiv Jun 13 '20 at 02:26
1

Microsoft was an early adopter of Unicode. Windows used UCS-2 for its Unicode handling until Windows 2000, then it switched to UTF-16 and has been using that ever since. Just about all Win32 ANSI-based APIs convert to Unicode internally before processing, and then convert results back to ANSI on output. It is best to avoid that conversion when possible, using Unicode-based APIs directly. – Remy Lebeau Jun 14 '20 at 21:20
@dxiv: MS uses Unicode internally. See filesystems or resources (if you read them as binary data). But RemyLebeau wrote it is a much better way. – Giacomo Catenazzi Jun 15 '20 at 12:08
@GiacomoCatenazzi Not sure what you mean by "*internally*". The entire Win32 public API [is Unicode](https://learn.microsoft.com/en-us/windows/win32/intl/conventions-for-function-prototypes), and has been since NT 3.5 last millennium. From the linked page: "*New Windows applications should use Unicode to avoid the inconsistencies of varied code pages and for ease of localization*". In this context "*new*" means 1994 and later. – dxiv Jun 15 '20 at 15:14
@dxiv: yes "relatively new" (because UTF-8 was not yet a thing). The code in the question is a *portable* code (not really, but anyway), so it requires zero-terminated-char-strings. The WIN32 part seems to me just an hack to get ANSI (or other) char from terminal. Or it is how I interpreted the code: how to work around `main()` to get wchar, but to continue as normal C program. The `#ifdef _WIN32` tell me that we are not expecting to use Win32 API everywhere in the code. Maybe I was too zealous to introduce the unwanted historical context. – Giacomo Catenazzi Jun 15 '20 at 15:40

Why WCHAR related code processing is needed

1 Answers1