5

Why printf() can display é (\u00E9 int UTF-16) and putwchar() can't ?

And what is the right syntax to get putwchar displaying é correctly ?

#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main() {
  wint_t wc = L'\u00E9';

  setlocale(LC_CTYPE, "fr_FR.utf8");

  printf("%C\n", wc);
  putwchar((wchar_t)wc);
  putchar('\n');

  return 0;
}

Environnement

  • OS : openSUSE Leap 42.1
  • compiler : gcc version 4.8.5 (SUSE Linux)
  • Terminal : Terminator
  • Terminal encoding : UTF-8
  • Shell : zsh
  • CPU : x86_64

Shell env :

env | grep LC && env | grep LANG
LC_CTYPE=fr_FR.utf8
LANG=fr_FR.UTF-8
GDM_LANG=fr_FR.utf8

Edit

in :

wint_t  wc = L'\u00E9'
setlocale(LC_CTYPE, "");

out:

C3 A9 0A E9 0A

in:

wint_t wc = L'\xc3a9';               
setlocale(LC_CTYPE, "");

out:

EC 8E A9 0A A9 0A
noraj
  • 3,964
  • 1
  • 30
  • 38
  • I think we'd need to know more about your environment (e.g. OS, compiler, terminal). – William McBrine Mar 20 '16 at 02:50
  • @WilliamMcBrine : Sorry I forgot, I was tired. I just add some infos, I wish it will be useful. – noraj Mar 20 '16 at 18:33
  • Your code works for me with `setlocale(LC_CTYPE, "");` allowing it to choose the native locale (which for me is defined by `LANG=en_US.UTF-8`). Hard coding the locale is probably a bad idea anyway. – Schwern Mar 20 '16 at 19:48
  • @Schwern : Have you two `é` ? One from printf (works for me) and one from putwchar (doesn't work for me) ? – noraj Mar 20 '16 at 21:03
  • @ImproveYourMind Yes, two é's. When I look at the output of your original code in a hex editor I get `e90a e90a`. `0a` is newline. `e9` is it's UTF-16 representation. You don't want UTF-16. Using `setlocale(LC_TYPE, "")` gives `c3a9 0ac3 a90a`. `0a` is still newline. `c3a9` is its UTF-8 representation, which is what you want. See http://www.fileformat.info/info/unicode/char/00e9/index.htm – Schwern Mar 20 '16 at 21:13
  • @Schwern : As you can see in my post edit, the result is not the same for me. – noraj Mar 20 '16 at 21:50
  • If stdout is set to UTF-8 then putting a UCS-2 character is not going to give the desired result – M.M Mar 20 '16 at 22:03
  • @ImproveYourMind Yeah, `printf` is outputting UTF-8, but `putwchar` is outputting the UTF-16 representation. Puzzling. Check the return value of `putwchar`? – Schwern Mar 20 '16 at 22:03
  • BTW `"%C"` is not defined by ISO C, if it is working for you it must be a compiler extension. I guess it performs UTF-8 conversion. – M.M Mar 20 '16 at 22:06
  • `u'\u00E9'` is another thing to try – M.M Mar 20 '16 at 22:34
  • @M.M : `"%C"` is not ISO, but if I use `"%lc"` (wich is ISO) I have the same output (hopefuly because `"%C"` normal behavior is `"%lc"` or not implemented). And `u'\u00E9'` give me an error because not defined. – noraj Mar 20 '16 at 22:45
  • Your compiler must only have partial C11 support, as `u` character literals are defined by C11 – M.M Mar 20 '16 at 22:51
  • I was reading POSIX again and saw that `"%C"` is [XSI]Extension *The functionality described is an XSI extension. Functionality marked XSI is also an extension to the ISO C standard. Application writers may confidently make use of an extension on all systems supporting the X/Open System Interfaces Extension.* [fprintf refering for printf](http://pubs.opengroup.org/onlinepubs/009695399/functions/fprintf.html) – noraj Mar 20 '16 at 22:55
  • What happens if you output to file by both methods? (set locale, then open the file with `fopen`, then do `putwchar` as the first function). hexdump the file to see what happens. – M.M Mar 20 '16 at 22:56
  • I saw in POSIX [fputwc refering for putwchar](group.org/onlinepubs/9699919799/functions/fputwc.html) that in *ISSUE 5* the type of argument wc is changed from wint_t to wchar_t. But if I change my input nothing change. – noraj Mar 20 '16 at 22:57
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/106863/discussion-between-improveyourmind-and-m-m). – noraj Mar 20 '16 at 23:00

2 Answers2

5

You cannot mix wide character and byte input/output functions (printf is a byte output function, regardless if it includes formats for wide characters) on the same stream. The orientation of a stream can only be reset with freopen, which must be done again before calling the byte-oriented putchar function.

#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main() {
    wint_t wc = L'\u00E9';

    setlocale(LC_CTYPE, "");

    printf("%lc\n", wc);
    freopen(NULL, "w", stdout);
    putwchar((wchar_t)wc);
    freopen(NULL, "w", stdout);
    putchar('\n');

    return 0;
}

The fact that the orientation can only be set by reopening the stream indicates that this is not intended to be done trivially, and most programs should use only one kind of output. (i.e. either wprintf/putwchar, or printf/putchar, using printf or wctomb if you need to print a wide character)

Random832
  • 37,415
  • 3
  • 44
  • 63
  • Somehow on my Android device under Termux it works well without freopen. Only on my x86 machine I need to call freopen to mix printf and putwchar. – neoexpert Feb 17 '20 at 18:35
  • @neoexpert my answer was based on the C standard - it's undefined behavior, so it depends on the runtime library, it might work, it might not, it might write garbage, it might crash, etc – Random832 Feb 19 '20 at 03:46
0

The problem is your setlocale() call failed. If you check the result you'll see that.

  if( !setlocale(LC_CTYPE, "fr_FR.utf8") ) {
      printf("Failed to set locale\n");
      return 1;
  }

The problem is fr_FR.utf8 is not the correct name for the locale. Instead, use the LANG format: fr_FR.UTF-8.

  if( !setlocale(LC_CTYPE, "fr_FR.UTF-8") ) {
      printf("Failed to set locale\n");
      return 1;
  }

The locale names are whatever is installed on your system, probably in /usr/share/locale/. Or you can get a list with locale -a.

It's rare you want to hard code a locale. Usually you want to use whatever is specified by the environment. To do this, pass in "" as the locale and the program will figure it out.

  if( !setlocale(LC_CTYPE, "") ) {
      printf("Failed to set locale\n");
      return 1;
  }
Schwern
  • 153,029
  • 25
  • 195
  • 336