0

When writing POD documentation, I realized that Unicode character ŷ became X on output.

Input:


=pod

=encoding utf8

=over

=item I<yt> (ŷ(t))

The value predicted for time I<t>.

=back

Output in PuTTY:

Output in PuTTY window (yellow mark)

Input in Emacs:

POD input In Emacs

Version of perldoc being used that of Perl 5.18.2 (SLES12 SP4, perl-5.18.2-12.20.1.x86_64), and LANG=en_US.UTF-8.

Update:

It seems to be a bug in Perl or in the package of SLES12 SP4: Using the same test on OpenSUSE Leap 15.1 with Perl 5.26.1, the output looks OK:

    yt (ŷ(t))
        The value predicted for time t.

However using pod2man from perl-5.26.1-15.87.x86_64 of openSUSE Leap 15.3, the output is not correct. OTOH using perldoc the output is correct, too.

U. Windl
  • 3,480
  • 26
  • 54
  • Is the file actually encoded as UTF-8? – Jörg W Mittag Apr 27 '20 at 07:12
  • Did you notice the three `U` (`UUU`) in the status line of Emacs? `file` says `awk or perl script, UTF-8 Unicode text`. – U. Windl Apr 27 '20 at 08:28
  • I'm not familiar with Emacs, sorry. I saw the `Char` line and I verified that U+0177 is, in fact, the correct code point for *Latin small letter y with circumflex*, but other than that, the status line may as well be written in Chinese for me. – Jörg W Mittag Apr 27 '20 at 11:38
  • My guess is that "something" in your entire processing pipeline (starting with perldoc and ending with PuTTY) either doesn't support Unicode, or does not have that particular glyph in its font, or *thinks* that some other component of the pipeline does not support it, and thus replaces the glyph which it believes cannot be rendered with an X. (The correct glyph to use would actually be the Unicode replacement symbol, which is sometimes a square with a cross inside it, but I guess if some component in your pipeline thinks there is a problem with `ŷ`, it will also think there is a problem with … – Jörg W Mittag Apr 27 '20 at 11:43
  • … the `�` character.) – Jörg W Mittag Apr 27 '20 at 11:44
  • @JörgWMittag I can't follow your arguments: Emacs is displaying the correct glyph in PuTTY. So how can you conclude "...doesn't support Unicode, or does not have that particular glyph in its font"? Also most ISO Latin (8-bit) characters are embedded in the Unicode positions, so if a character existed in ISO Latin-1, it will have (in most cases) the same code position in Unicode. – U. Windl Apr 27 '20 at 12:47

0 Answers0