4

Why does here wcwidth return "-1" (not a printable wide character) width "Ԥ" (0x0524)?

#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int wcwidth(wchar_t wc);

int main()
{
    setlocale(LC_CTYPE, "");

    wchar_t wc1 = L'合'; // 0x5408
    int width1 = wcwidth(wc1);
    printf("%lc - print width: %i\n", wc1, width1);

    wchar_t wc2 = L'Ԥ'; // 0x0524
    int width2 = wcwidth(wc2);
    printf("%lc - print width: %i\n", wc2, width2);

    return 0;
}

Output:

合 - print width: 2
Ԥ - print width: -1
sid_com
  • 24,137
  • 26
  • 96
  • 187

1 Answers1

3

Most likely U+0524 was not a valid character when your libc character database was created. It was added in Unicode 5.2. Your font may include the character already, but wcwidth does not look at which font is used.

  • @sid_com Unfortunately, I don't know of a better way than to check the libc documentation and/or sources and hope that it's included as a comment. –  May 04 '13 at 07:58
  • Late to the party (as usual), but I'm having this problem as well. I don't understand why locale plays into what wcwidth returns. Does the codepoint differ (visually or even in meaning) depending on which locale is used? I thought the whole point of unicode was that it is a big list of *all* characters, regardless of language. Sure, new ones might be missing from libc, but my case, however, is the swedish/german `Ö` (\u00d6) which for sure is not a new one. libc's wcwidth still returns -1 (libc 2.34). – pythonator Dec 30 '21 at 17:26
  • For reference, I solved it using https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c It only supports unicode 5, however, but it returns correctly for my purposes. And it doesn't care about locale. – pythonator Dec 30 '21 at 17:26