I am having a utf-8 encoded roff-file that I want to convert to a manpage with
$ nroff -mandoc inittab.5
However, characters in [äöüÄÖÜ]
, e.g. are not displayed properly as it seems that nroff assumes ISO 8859-1 encoding (I am getting [äöüÃÃÃ
] instead. Calling nroff
with the -Tutf8
flag does not change the behaviour and the locale environment variables are (I assume properly) set to
LANG=de_DE.utf8
LC_CTYPE="de_DE.utf8"
LC_NUMERIC="de_DE.utf8"
LC_TIME="de_DE.utf8"
LC_COLLATE="de_DE.utf8"
LC_MONETARY="de_DE.utf8"
LC_MESSAGES="de_DE.utf8"
LC_PAPER="de_DE.utf8"
LC_NAME="de_DE.utf8"
LC_ADDRESS="de_DE.utf8"
LC_TELEPHONE="de_DE.utf8"
LC_MEASUREMENT="de_DE.utf8"
LC_IDENTIFICATION="de_DE.utf8"
LC_ALL=
Since nroff
is only a wrapper-script and eventually calls groff
I checked the call to the latter which is:
$ groff -Tutf8 -mandoc inittab.5
Comparing the byte-encodings of characters in the src file and the output file I am getting the following conversions:
character src file output file
--------- -------- -----------
ä C3 A4 C3 83 C2 A4
ö C3 B6 C3 83 C2 B6
ü C3 BC C3 83 C2 BC
Ä C3 84 C3 83
Ö C3 96 C3 83
Ü C3 9C C3 83
ß C3 9F C3 83
This behaviour seems very weird to me (why am I getting an additional C3 83
and have the original byte-sequence truncated alltogether for big umlauts and ß
?)
Why is this and how can I make nroff
/groff
properly convert my utf-8 encoded file?
EDIT: I am using GNU nroff (groff) version 1.22.2