Running the following code seems to generate the wrong values:
byte[] data = "\u00a5".getBytes("Shift_JIS");
It produces [ -4, -4 ], but I expect [ 0x5c ]
I've tried various alternative names, "Shift-JIS", "shift_jis", "cp932" and all produce the same result.
When I feed the resulting data into the Shift-JIS decoder, I get an exception: java.nio.charset.UnmappableCharacterException: Length: 2
That is, with the decoder configured as follows:
Charset charset = Charset.forName("Shift_JIS);
CharsetDecoder decoder = charset.newDecoder()
.onMalformedInput(CodingErrorAction.REPORT)
.onUnmappableCharacter(CodingErrorAction.REPORT);
But given the output of the encoder looks wrong, my guess is that the decoder is irrelevant. My point is that regardless of the actual bytes, the encoder generates data that it can't decode.
The full width Yen (U+FFE5) encodes to [ -127 (0x81), -113 (0x8F) ], and decodes correctly.
Strangely, if I try to decode [ 92 (0x5C) ] which is what I think the Shift-JIS encoding of the single width Yen is, the Android/Java decoder produces a back slash, leaving the character as 92.
If the encoder didn't support a given character, I would expect a replacement character such as '?'. But -4 (0xFC) doesn't even seem to be valid Shift-JIS. It's not even the Unicode replacement character U+FFFD. Using the following line I can see that the encoder seems to be configured to use [-4, -4]:
Charset.forName("Shift_JIS").newEncoder().replacement()
- So why isn't the single width Yen mapped in Shift-JIS?
- Is [-4, -4] a sensible encoder replacement?
- Why doesn't the decoder support 0x5C mapping to Yen (U+00A5)?
- If 0x5C is not the correct encoding, what is?