I am trying to format wchar_t*
with UTF-8 characters using vsnprintf
and then printing the buffer using printf
.
Given the following code:
/*
This code is modified version of KB sample:
https://www.ibm.com/support/knowledgecenter/en/ssw_ibm_i_73/rtref/vsnprintf.htm
The usage of `setlocale` is required by my real-world scenario,
but can be modified if that fixes the issue.
*/
#include <wchar.h>
#include <stdarg.h>
#include <stdio.h>
#include <locale.h>
#ifdef MSVC
#include <windows.h>
#endif
void vout(char *string, char *fmt, ...)
{
setlocale(LC_CTYPE, "en_US.UTF-8");
va_list arg_ptr;
va_start(arg_ptr, fmt);
vsnprintf(string, 100, fmt, arg_ptr);
va_end(arg_ptr);
}
int main(void)
{
setlocale(LC_ALL, "");
#ifdef MSVC
SetConsoleOutputCP(65001); // with or without; no dice
#endif
char string[100];
wchar_t arr[] = { 0x0119 };
vout(string, "%ls", arr);
printf("This string should have 'ę' (e with ogonek / tail) after colon: %s\n", string);
return 0;
}
I compiled with gcc v5.4 on Ubuntu 16 to get the desired output in BASH:
gcc test.c -o test_vsn
./test_vsn
This string should have 'ę' (e with ogonek / tail) after colon: ę
However, on Windows 10 with CL v19.10.25019 (VS 2017), I get weird output in CMD:
cl test.c /Fetest_vsn /utf-8
.\test_vsn
This string should have 'T' (e with ogonek / tail) after colon: e
(the ę
before colon becomes T
and after the colon is e
without ogonek)
Note that I used CL's new /utf-8
switch (introduced in VS 2015), which apparently has no effect with or without. Based on their blog post:
There is also a /utf-8 option that is a synonym for setting “/source-charset:utf-8” and “/execution-charset:utf-8”.
(my source file already has BOM / utf8'ness and execution-charset is apparently not helping)
What could be the minimal amount of changes to the code / compiler switches to make the output look identical to that of gcc?