Linux, field_buffer does not provide a UTF-8 string

Question

In a C program for Linux, with ncursesw and form, I need to read the string stored in a field, with support for UTF-8 characters. When ASCII only is used, it is pretty simple, because the string is stored as an array of char:

char *dest;
...
dest = field_buffer(field[0], 0);

If I try to type a UTF-8 and non-ASCII character in the field with this code the character does not appear and it is not handled. In this answer for UTF-8 it is suggested to use ncursesw. But with the following code (written following this guide)

#define _XOPEN_SOURCE_EXTENDED
#include <ncursesw/form.h>
#include <locale.h>

int main()
{
        ...
        setlocale(LC_ALL, "");
        ...
        initscr();


        wchar_t *dest;
        ...
        dest = field_buffer(field[0], 0);
}

the compiler produces an error:

warning: assignment from incompatible pointer type [enabled by default]
  dest = field_buffer(field[0], 0);
            ^

How to obtain from the field an array of wchar_t? ncursesw uses get_wch instead of getch, so which function does it use instead of field_buffer()? I couldn't find it by googling.

I'm not getting you. You wrote `UTF-8` is ok, because of `field_buffer` return `char *`. If you use `ncursesw` that is implemented for wide char s `UTF-16` you cannot use field_buffer function. I'm not an expert of `ncursesw`. — LPs, Oct 01 '15 at 13:31
@LP, `wchar_t` *can be* UTF-16, but it is in no way required to be. Its width may be smaller or larger than 16 bits, and the character encoding implicit in it is unspecified. In C2011, though, there is `char16_t`, which often is UTF-16. You can tell (for `char16_t`) based on whether macro `__STDC_UTF_16__` is defined. — John Bollinger, Oct 01 '15 at 13:44
@LPs I'm not getting you too :). **In order to obtain** a support for `UTF-8`, I used `ncursesw` and `wchar_t`. If I can't use `field_buffer`, which function should I use? — BowPark, Oct 01 '15 at 13:47
Probably it is my error, but `char` type is already `UTF-8`, then you don't need to use `ncursesw`. — LPs, Oct 01 '15 at 13:50
@LPs I tried to put some `UTF-8` characters in the field but unfortunately it did not work with `char`. — BowPark, Oct 01 '15 at 13:53
@BowPark, the ncursesw library is an *extended* version of ncurses. It does not redefine existing functions (what a mess that would make!), rather, it provides *additional* functions aimed at supporting multibyte characters. — John Bollinger, Oct 01 '15 at 13:54
@JohnBollinger You are absolutely right. So, my question is: which is the `ncursesw`-equivalent function of `field_buffer()`? — BowPark, Oct 01 '15 at 13:57
@BowPark, as far as I know or can tell, there is no version of the `field_` routines that work with or return `wchar_t`. As I understand it, you are expected to use `setlocale()` before starting in with the ncurses functions. Choose a locale that supports Unicode and uses UTF-8 encoding, and the ordinary `field_` routines should work -- at least as well as ever they will do for wide characters, anyway. Here is some general advice: http://www.roguebasin.com/index.php?title=Ncursesw — John Bollinger, Oct 01 '15 at 14:11
@JohnBollinger If you look the headers included in the question, you will find that they are the ones suggested in the guide you linked. I already followed all the 8 steps. My `locale` is `UTF-8`, but nevertheless the compiler gives me the error I wrote. And anyway: even if all the settings were correct, should I use the `field_buffer()` function with `wchar_t` or `char`? — BowPark, Oct 01 '15 at 14:28
@BowPark, `field_buffer()` returns a `char *`. That is what you use with it. If you have set an appropriate locale then the buffer you obtain that way should be encoded in UTF-8. To some extent, this should be transparent when you are using the system's default locale. — John Bollinger, Oct 01 '15 at 14:34
@JohnBollinger Ok! When I used `char` with a `UTF-8` locale, the non-ASCII characters were not displayed in the field in the screen (instead of the ASCII ones, which correctly appeared). Do you think it could be a problem of my environment instead of `ncurses`? — BowPark, Oct 01 '15 at 15:00
@BowPark, I have no better advice than that provided at the roguebasin link I gave you earlier. The problem could be the locale, the console program, the console font, or even the data. It might be which ncurses lib you are linking. There are probably other possibilities. All I can tell you with confidence is that `field_buffer` returns `char *`, not `wchar_t *`, that under suitable conditions you can use ncursesw with multibyte characters, that ncurses depends on the locale to determine how to handle characters, and that if the locale so indicates, it works with UTF-8. — John Bollinger, Oct 01 '15 at 15:11
@JohnBollinger Now after several attempts it works. The progam gets and prints `UTF-8` characters. If you prefer to write an answer, I will choose it. Otherwise and anyway thank you! — BowPark, Oct 01 '15 at 15:14
@BowPark, I suggest you answer your own question, including the code you ultimately used and describing anything you had to do outside the code to make it work. That would constitute a far better answer than any I could write. — John Bollinger, Oct 01 '15 at 15:56

score 0 · Accepted Answer · answered Oct 01 '15 at 16:46

The program is compiled in a system with the following locale:

$ locale
LANG=it_IT.UTF-8
LANGUAGE=
LC_CTYPE="it_IT.UTF-8"
LC_NUMERIC="it_IT.UTF-8"
LC_TIME="it_IT.UTF-8"
LC_COLLATE="it_IT.UTF-8"
LC_MONETARY="it_IT.UTF-8"
LC_MESSAGES="it_IT.UTF-8"
LC_PAPER="it_IT.UTF-8"
LC_NAME="it_IT.UTF-8"
LC_ADDRESS="it_IT.UTF-8"
LC_TELEPHONE="it_IT.UTF-8"
LC_MEASUREMENT="it_IT.UTF-8"
LC_IDENTIFICATION="it_IT.UTF-8"
LC_ALL=

It supports and uses UTF-8 as a default. With a locale like this, when the ncursesw environment is used, the C program should be able to save UTF-8 characters into a char array. In order to correctly set up ncursesw it is very important to follow all the steps of the mentioned guide. In particular, the program should have the header

#define _XOPEN_SOURCE_EXTENDED
#include <ncursesw/form.h>
#include <stdio.h>
#include <locale.h>

The program should be compiled as

gcc -o executable_file source_file.c -lncursesw -lformw

and the program should contain

setlocale(LC_ALL, "");

before initscr();. With all these conditions satisfied, the string can be saved into a normal char array, as if ncurses and ASCII were used instead of ncursesw and UTF-8. As specified by John Bollinger in the comments, the function field_buffer can only return a char * and so it is unuseful to use any other data type such as wchar_t.

Linux, field_buffer does not provide a UTF-8 string

1 Answers1