2

Code:

#include <stdio.h>
#include <wchar.h>
#define USE_W
int main()
{
#ifdef USE_W
    const wchar_t *ae_utf16 = L"\x00E6 & ASCII text ae\n";
    wprintf(ae_utf16);
#else
    const char *ae_utf8 = "\xC3\xA6 & ASCII text ae\n";
    printf(ae_utf8);
#endif
    return 0;
}

Output:

ae & ASCII text ae

While printf produces correct UTF-8 output:

æ & ASCII text ae

You can test this here.

R. Martinho Fernandes
  • 228,013
  • 71
  • 433
  • 510
user206334
  • 850
  • 1
  • 8
  • 18

1 Answers1

1

printf just sends raw bytes to your terminal; it does not know anything about encodings. If your terminal happens to be configured to interpret that as UTF-8, it will show the right characters.

wprintf, on the other hand, does know about encodings. It behaves as though it uses the function wcrtomb, which encodes a wide character (wchar_t) into a multibyte sequence, depending on the current locale. If the default locale happens to be "C", which is quite minimalistic, the character æ gets converted to the "more or less equivalent" byte sequence ae.

If you set the locale explicitly to something using UTF-8, like "en_US.UTF-8", the output is as expected. Of course, the set of supported locales differs per system, so it's no good to hardcode this.

Thomas
  • 174,939
  • 50
  • 355
  • 478
  • Thank you for information about the requirement to set a locale before using wprintf. – user206334 Apr 08 '13 at 10:40
  • This works on Linux. On Windows, trying to set the locale to a UTF-8 code page [will fail](https://msdn.microsoft.com/en-us/library/x99tb11d.aspx). AFAICT, `wprintf` cannot be used to print a UTF-8 string there. [WriteConsole](https://msdn.microsoft.com/en-us/library/windows/desktop/ms687401(v=vs.85).aspx) is required. – mgiuffrida Dec 03 '16 at 05:24