5

[Note: question basically re-edited after a lot of playing around]

In Java, you have Charset, defining a character encoding. From a Charset, you can obtain two objects:

  • a CharsetEncoder, to turn a char sequence into a byte sequence;
  • a CharsetDecoder, to turn a byte sequence into a char sequence.

Both of these classes have the following methods defined: .onUnmappableCharacter() and .onMalformedInput(). If you tell them for each of these to CodingErrorAction.REPORT they will throw either of these two exceptions: UnmappableCharacterException and MalformedInputException.

With a CharsetEncoder, I am able to generate both of them:

  • feed it with a CharBuffer containing two high surrogates following one another --> MalformedInputException;
  • feed it with a CharBuffer containing a char (or char sequence) which the encoding cannot represent: UnmappableCharacterException.

With a CharsetDecoder:

  • feed it with an illegal byte sequence: MalformedInputException -- easy to do;
  • UnmappableCharacterException --> how?

In spite of all my research, I just couldn't do it.

All of this in spite of having played a lot with CharsetDecoder. I could find no combination of Charset and byte sequence able to generare this error...

Is there any at all?

fge
  • 119,121
  • 33
  • 254
  • 329
  • map any extended character in a utf charset to ascii. – jtahlborn Feb 25 '14 at 17:51
  • @jtahlborn uhm, what do you call an "extended character"? – fge Feb 25 '14 at 17:52
  • anything outside the ascii charset, like `ü`. – jtahlborn Feb 25 '14 at 17:53
  • @jtahlborn just tried... Still a `MalformedInputException` – fge Feb 25 '14 at 17:54
  • show some example code. – jtahlborn Feb 25 '14 at 17:55
  • where's you code which is attempting to generate the UnmappableCharException? – jtahlborn Feb 25 '14 at 18:54
  • ah, now i understand the confusion. you are specifically referring to the `decode()` method, where i was thinking about _encoding_. that exception makes more sense for encoding, i'm unsure how you would generate it when decoding (it's possible they added it to the api just to make it similar to encoding). – jtahlborn Feb 25 '14 at 18:58
  • UnmappableCharacterException doesn't make sense for decoding. It would only be caused during encoding, when you supply a bad character. If an input array gets converted into a bad character, then that falls under the MalformedInputException. – Anubian Noob Apr 06 '14 at 02:26
  • @AnubianNoob I thought the same about `MalformedInputException` when using a `CharsetEncoder`, and yet... – fge Apr 06 '14 at 02:27

2 Answers2

10

It's just a matter of finding a character set with an unmappable byte sequence.

Take, for example, IBM1098. It can't map the hex bytes

0x80
0x81

So put these in a ByteBuffer, rewind it, and try to decode it.

public class Test {
    public static void main(String[] args) throws CharacterCodingException {
        ByteBuffer buffer = ByteBuffer.allocate(8);
        buffer.putInt(0x80);
        buffer.putInt(0x81);
        buffer.position(0);
        Charset charset = Charset.forName("IBM1098");
        CharsetDecoder decoder = charset.newDecoder();
        decoder.decode(buffer);
    }   
}

This throws

Exception in thread "main" java.nio.charset.UnmappableCharacterException: Input length = 1
    at java.nio.charset.CoderResult.throwException(CoderResult.java:282)
    at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:816)
    at com.test.Test.main(Test.java:16)

Ideone.com attempt.

Sotirios Delimanolis
  • 274,122
  • 60
  • 696
  • 724
  • Uuh... OK, I still don't understand this exception very well it seems... Good find! +516 ;) – fge Apr 07 '14 at 04:32
  • @fge It doesn't seem like it applies to all character sets though. For example, [UTF-8 clearly defines](http://en.wikipedia.org/wiki/UTF-8) what is invalid, but says nothing (as far as I can see) about unmappable character. I would assume that they are all mappable, just some sequences are invalid, ie. malformed. – Sotirios Delimanolis Apr 07 '14 at 04:34
  • I surmised as much, but could not find a charset which had this "unmappable" property. You did, however! – fge Apr 07 '14 at 04:38
  • @fge I got all `availableCharsets` and chose among the most random sounding ones :). – Sotirios Delimanolis Apr 07 '14 at 04:39
0

When you supply a character to the decoder, the decoder can tell that a character is not appropriate for the charset and throw a UnmappableCharacterException.

When you supply a byte array to the encoder, is assumes that it has been encoded properly. Thus when it decodes your byte array and gets a bad character, it assumes you have a broken encoder or bad input, which causes a MalformedInputException.

Anubian Noob
  • 13,426
  • 6
  • 53
  • 75
  • I _know_ all that, I can produce `MalformedInputException`s at will. What I want to know is how to make a decoder raise `UnmappableCharacterException` if it is possible at all; and if not, a definitive proof as to why it isn't possible. – fge Apr 06 '14 at 02:36