0

I'm using this code:

use Unicode::UTF8 qw[decode_utf8 encode_utf8];
my $d = "opposite Spencer\u2019s Aliganj, Lucknow";
my $string = decode_utf8($d);
my $octets = encode_utf8($d);
print "\nSTRING :: $string";

I want output like

opposite Spencer's Aliganj, Lucknow

what to do ?

friedo
  • 65,762
  • 16
  • 114
  • 184
Gansi
  • 11
  • 3
  • 2
    And how is `Unicode::UTF8` supposed to determine that U+2019 should be translated to an apostrophe? –  Oct 18 '13 at 06:12
  • 2
    Do you want `'` (an ASCII apostrophe) or `’` (a Unicode quotation mark, the codepoint 0x2019)? – Slaven Rezic Oct 18 '13 at 10:19
  • If you want to convert a Unicode string to ASCII, this is a duplicate of [How can I substitute Unicode characters with ASCII in Perl?](http://stackoverflow.com/questions/2309215/how-can-i-substitute-unicode-characters-with-ascii-in-perl). – nwellnhof Oct 18 '13 at 10:52

2 Answers2

1

If you just want unicode #2019 to become you can use one of this ways:

use strict;
use warnings;
use open ':std', ':encoding(utf-8)';
print chr(0x2019);
print "\x{2019}";  # for characters 0x100 and above
print "\N{U+2019}";

\u \U in perl translates to uppercase in perl:

Case translation operators use the Unicode case translation tables when character input is provided. Note that uc(), or \U in interpolated strings, translates to uppercase, while ucfirst, or \u in interpolated strings, translates to titlecase in languages that make the distinction (which is equivalent to uppercase in languages without the distinction).

Suic
  • 2,441
  • 1
  • 17
  • 30
0

You're trying to parse butchered JSON.

You could parse it yourself.

use Encode qw( decode );

my $incomplete_json = "opposite Spencer\u2019s Aliganj, Lucknow";

my $string = $incomplete_json;
$string =~ s{\\u([dD][89aAbB]..)\\u([dD][cCdDeEfF]..)|\\u(....)}
            { $1 ? decode('UTF-16be', pack('H*', $1.$2)) : chr(hex($3)) }eg;

Or you could fix it up then use an existing parser

use JSON::XS qw( decode_json );

my $incomplete_json = "opposite Spencer\u2019s Aliganj, Lucknow";

my $json = $incomplete_json;
$json =~ s/"/\\"/g;
$json = qq{["$json"]};

my $string = decode_json($json)->[0];

Untested. You may have to handle other slashes. Which solution is simpler depends on how you have to handle the other slashes.

ikegami
  • 367,544
  • 15
  • 269
  • 518