0

Following program returns incorrect values {-1,0,-1} on HPUX whereas if I run the same program on Linux it works correctly for the locale "de_DE.iso885915@euro". Is there any issue with HPUX wcwidth, iswprint and wcswidth.

int main () 
{
    wchar_t str[2];
    wchar_t ch = 8364; /* Euro sign */

    str[0] = ch;
    str[1] = '\0';

    /* Locale set to de_DE.iso885915@euro before running this program */
    setlocale(LC_ALL, "");

    printf ("%d\n", wcwidth(ch));
    printf ("%d\n", iswprint(ch));
    printf ("%d\n", wcswidth(str, 2));

    return 0; 
}
j0k
  • 22,600
  • 28
  • 79
  • 90
Manya K
  • 33
  • 2

1 Answers1

1

It's possible that HPUX does not use Unicode as the encoding for wchar_t but instead simply stores the 8-bit char values in a 32-bit wchar_t when using 8-bit locales. This is an ugly old-fashioned practice that's generally frowned upon now, but it's legal per the C standard, and in fact the C standard allows and encourages implementations to provide the predefined macro __STDC_ISO_10646__ to indicate that wchar_t values are Unicode. If you try switching to a UTF-8 based locale and the problem goes away, this is almost certainly the issue you're having.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • Problem is experienced only on de_DE.iso885915 and ko_kr.iso885915, it works well with other locales like UTF-8 and EUC. Infact for fr_FR.iso885915@euro also it works fine. Strange issue, is there patch from HP available as of now? – Manya K Jun 30 '12 at 02:37
  • Probably not; this is not a bug but an allowed behavior, so at worst it's just a "quality of implementation" issue. This is 2012 anyway. You should not be using non-UTF-8 locales. If you have legacy data, process it with `iconv`. – R.. GitHub STOP HELPING ICE Jun 30 '12 at 02:56
  • iconv? is it possible to get display width of character using iconv, i am using it only for the conversion purpose? Is there any other way we can think of getting a display width of a wide character rather than wcwidth? – Manya K Jun 30 '12 at 03:14
  • You can use `iconv` to convert from `UTF-8` (or whatever your preferred representation of the Unicode character is) to `WCHAR_T`, then call `wcwidth`, etc. on the result. – R.. GitHub STOP HELPING ICE Jun 30 '12 at 03:33
  • In fact I am doing that, that's how I got ch = 8364. IN the sample program didn't include the conversion code to keep it simple. – Manya K Jun 30 '12 at 05:06
  • You used `iconv` from `UTF-8` to `WCHAR_T`, *with the locale set to your legacy 8-bit locale*? If so, and if it generated 8364 in that case, then there's a bug. – R.. GitHub STOP HELPING ICE Jun 30 '12 at 12:18