0

I've been scratching my head for a few days now trying to figure out how to correctly encode scandinavian characters for use in a text field

These three characters: Æ Ø Å ( æ ø å) show up as ���. meaning that the running this code

System.out.println("øst");

prints"�st" . and i have no idea why.

Here's the code where i redirect the System.out to a printstream.

System.setProperty("user.language", "da");
    OutputStream out = new OutputStream() {
        @Override
        public void write(int b) throws IOException {
            appendConsole(new String(new byte[]{(byte)b}, "UTF-8"));
        }
    };
    PrintStream ps;
    try {
        ps = new PrintStream(out, true, "UTF-8");
        System.setOut(ps);
    } catch (UnsupportedEncodingException ex) {
        Logger.getLogger(GameController.class.getName()).log(Level.SEVERE, null, ex);
    }

If anyone has a solution to this problem, it is greatly appreciated!

2 Answers2

0

Try to not cast the int to a byte to a string, but to a char. Does that work?

tsatke
  • 161
  • 1
  • 19
  • nah, exactly the same result, unfortunately :/ – Daniel Holst Dec 08 '15 at 12:48
  • that's actually where i started, casting to byte and using the string constructor was my attempt at fixing it :) – Daniel Holst Dec 08 '15 at 12:49
  • get rid of this `new String(...);`, this is very resourcive (?) :D what about converting your special chars to unicode chars? e.g. `ø` would be transformed into `\u00f8` (`System.out.println("\u00f8st");`) – tsatke Dec 08 '15 at 19:00
0

When those characters are encoded as UTF-8, they each require two bytes. For example, Æ in UTF-8 is { (byte) 0xc3, (byte) 0x86 }. You cannot construct a String from just one of the bytes; both bytes are required to form a valid UTF-8 sequence.

You'll need to accumulate bytes until you have enough to form a full UTF-8 sequence, then create a String from that. ByteBuffer and CharsetDecoder are made for that:

// A UTF-8 sequence for a single character is at most 4 bytes long.
private final ByteBuffer buffer = ByteBuffer.allocate(4);

private final CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder();

@Override
public void write(int b)
throws IOException {

    buffer.put((byte) b);

    int size = buffer.position();
    int first = buffer.get(0) & 0xff;
    if (size == 4 ||
        (first >= 0xe0 && size == 3) ||
        (first >= 0xc0 && size == 2) ||
        first < 0x80) {

        buffer.flip();
        appendConsole(decoder.decode(buffer).toString());
        buffer.clear();
    }
}
VGR
  • 40,506
  • 4
  • 48
  • 63