1

I don't know why this code doesn't work:

use strict;
use warnings;
use Encode qw/decode/;
my $entity_unicode = "00A0";
$entity_unicode = decode("UTF-16", pack('H4', $entity_unicode));
print $entity_unicode, "\n";

It prints out: "UTF-16:Unrecognised BOM a0 at /usr/lib/perl/5.10/Encode.pm line 174.".

XoR
  • 2,556
  • 4
  • 17
  • 15

1 Answers1

2

Without a BOM (U+FEFF) at the start of the string to decode, there no way to know if 00 A0 is U+00A0 (UTF-16be) or U+0A00 (UTF-16le, used by Windows). One must specify the exact encoding when the BOM is absent. In this case, that's UTF-16be.

ikegami
  • 367,544
  • 15
  • 269
  • 518