2

I've noticed that Pod::Usage pod2man, and even pod2markdown are doing the wrong encoding in their output for certain characters. These programs are encoding the copyright symbol as a single byte 0xA9 which is its Unicode Code Point as well as its iso-8859-1 and cp1252 encodings, not its utf-8 encoding which should be the multibyte 0xCA:0xA9.

The issue has to do with Pod::Escapes which I've updated to version 1.07 (the latest version) and utf8::unicode_to_native (which I can't find).

Looking at Pod::Escape, the %Name2character_number hash sets the key copy to the unicode character point 0xA9 (169) which is correct.

However, the %Name2character hash is getting the wrong representation from the utf8::unicode_to_native subroutine. In fact, all of the Unicode character codes 0x80 to 0xFF are being set as their single byte representation and not as the utf-8 encoding. All characters above 0xFF are being set correctly.

Is there a way to fix this issue? I am running Perl 5.18.2 on Mac OS X 10.10 (Yosemite) which is natively utf-8.

ikegami
  • 367,544
  • 15
  • 269
  • 518
David W.
  • 105,218
  • 39
  • 216
  • 337
  • "Native" refers to ASCII vs EBCDIC. `unicode_to_native` is not suppose to encode, so it's returning what it should. The problem is elsewhere – ikegami Jan 13 '15 at 18:08
  • If you are sure this is incorrect behavior, then I highly recommend you report it to the Perl community. You would be doing EVERYONE a great favor by getting this fixed in the next version of Perl :) – mareoraft Jan 14 '15 at 04:44
  • I was hoping it was something I was _doing wrong_. For example, I have my environment set incorrectly, or the issue is with my `groff` or `troff` on my Mac and this isn't a Perl issue, but a Mac issue. Playing around with this more, I realize this happens with all the characters encoded from `0x80` to `0xFF`, but works fine for anything encoded above `0xFF`. – David W. Jan 14 '15 at 11:49

0 Answers0