3

When I try decode a shift-jis encoded string and encode it back, some of the characters get garbled: I have following code:

use Encode qw(decode encode);
$val=;
print "\nbefore decoding: $val";
my $ustr = Encode::decode("shiftjis",$val);
print "\nafter decoding: $ustr";
print "\nbefore encoding: $ustr";
$val = Encode::encode("shiftjis",$ustr);
print "\nafter encoding: $val";

when I use a string : helloソworld in input it gets properly decoded and encoded back,i.e. before decoding and after encoding prints in above code print the same value. But when I tried another string like : ⅠⅡⅢⅣⅤⅥⅦⅧⅨⅩ

The end output got garbled.

Is it a perl library specific problem or it is a general shift jis mapping problem? Is there any solution for it?

Sushant
  • 379
  • 3
  • 14

2 Answers2

3

You should simply replace the shiftjis with cp932.

http://en.wikipedia.org/wiki/Code_page_932

yibe
  • 3,939
  • 2
  • 24
  • 17
  • 1
    Yes - this is a notorious problem where the encoding used in Microsoft Windows isn't really "shift JIS" but CP932. –  Apr 02 '11 at 12:51
  • Thanks, It worked on windows very well. But it is not working on unix platforms , do we have to use any specific encodings for platforms such as Linux, AIX etc ? – Sushant Apr 14 '11 at 09:13
  • @Sush Hmm, I have no idea why it doesn't work for you... [`Encode`](http://search.cpan.org/perldoc?Encode) has mapping tables for Japanese encodings (such as [`cp932`](http://cpansearch.perl.org/src/DANKOGAI/Encode-2.42/ucm/cp932.ucm), [`shiftjis`](http://cpansearch.perl.org/src/DANKOGAI/Encode-2.42/ucm/shiftjis.ucm), [etc.](http://search.cpan.org/perldoc?Encode::JP)) and so it should work platform-independently - in fact, I made sure it works properly on Linux. I suspect your problem lies elsewhere. – yibe Apr 15 '11 at 04:45
2

You lack error-checking.

use utf8;
use Devel::Peek qw(Dump);
use Encode qw(encode);

sub as_shiftjis {
    my ($string) = @_;
    return encode(
        'Shift_JIS',    # http://www.iana.org/assignments/character-sets
        $string,
        Encode::FB_CROAK
    );
}

Dump as_shiftjis 'helloソworld';
Dump as_shiftjis 'ⅠⅡⅢⅣⅤⅥⅦⅧⅨⅩ';

Output:

SV = PV(0x9148a0) at 0x9dd490
  REFCNT = 1
  FLAGS = (TEMP,POK,pPOK)
  PV = 0x930e80 "hello\203\\world"\0
  CUR = 12
  LEN = 16
"\x{2160}" does not map to shiftjis at …
daxim
  • 39,270
  • 4
  • 65
  • 132
  • As far as it goes your answer is correct but the actual problem is slightly deeper than that. –  Apr 02 '11 at 12:52