0

I create the following code to translate string with given charset to some another. Please, see it bellow

public static String convertCharsetOfTheString(String strToConvert,String targetCharsetName) throws UnsupportedEncodingException {
        CharsetDetector detector = new CharsetDetector();
        detector.setText(strToConvert.getBytes());
        CharsetMatch detect = detector.detect();

        String currentCharsetName = detect.getName();

        Charset currentCharset=Charset.forName(currentCharsetName);

        Charset targetCharset=Charset.forName(targetCharsetName);

        ByteBuffer wrap = ByteBuffer.wrap(strToConvert.getBytes(targetCharsetName)); //.wrap(strToConvert.getBytes());

        CharBuffer decode = currentCharset.decode(wrap);


        ByteBuffer encode = targetCharset.encode(decode);


        return new String(encode.array(),targetCharsetName);

    }

And for some symbols, I have encoding/decoding error. I.e. hiragana letter じ became unreadable.

I assume it's because hiragana have 3 bytes instead of two. But don't know how to fix the problem.

Does anybody know how to fix it?

  • How are you calling this code, e.g. what charset are you trying? What is the output? What output were you expecting? – Josh Lee Aug 23 '18 at 16:55
  • What you are trying to do here does not make any sense. A string is logically a sequence of characters, it does not have an encoding (its internal representation is an implementation detail). Different encodings only come into play if you convert a string to bytes or vice versa. – Henry Aug 23 '18 at 17:19
  • 2 Henry But it's working – Vladimir Kozhaev Aug 23 '18 at 17:22
  • 2 Josh Lee I call different charsets. The output is UTF-8 – Vladimir Kozhaev Aug 23 '18 at 17:23
  • No, it is not working or you would not ask this question. – Henry Aug 23 '18 at 17:31
  • I mean I have correct strings except for several symbols. For example hiragana don't encoded correctly – Vladimir Kozhaev Aug 23 '18 at 18:02
  • Why is the source type `String` and not `byte[]`? `String` in Java means UTF-16. And `.getBytes()` uses the user's system's default character encoding! – Tom Blodget Aug 26 '18 at 14:08
  • 1
    You still have not explained what you're trying to do, what arguments you are passing to this function, what charset you're attempting to use, or what charset is being used by `getBytes()`. – Josh Lee Aug 30 '18 at 22:06

0 Answers0