-1

I have hex byte string like "D09FD0B5D180D0BDD0B8D0BA" this is "Перник".

For each letter of the Cyrillic alphabet, I need 2 bytes.

For "П" I need "D0 9F".

If I use:

char letter = (char) 1055; // this is "П"

My problem is how from hex "D0 9F" to get int value "1055". Or how from "D09FD0B5D180D0BDD0B8D0BA" to convert to "Перник".

Most Noble Rabbit
  • 2,728
  • 2
  • 6
  • 12
Radoslav
  • 75
  • 1
  • 10
  • 1
    First convert the hex string to a byte array. Use this as input for [ByteArrayInputStream](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/io/ByteArrayInputStream.html) and wrap it in an [InputStreamReader](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/io/InputStreamReader.html#%3Cinit%3E(java.io.InputStream,java.nio.charset.Charset)) with the correct [Charset](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/charset/Charset.html) to read the content as a string. You should learn more about character encodings. – vanje May 28 '21 at 14:24
  • [Guide to Character Encoding](https://www.baeldung.com/java-char-encoding) – vanje May 28 '21 at 14:25

1 Answers1

1

You didn't specified the encoding, however it appears to be UTF-8 so the character П is not encoded as 041F (dec. 1055), but as D09F (dec. 53407).

Note also that UTF-8 is a variable length encoding, so the assumption 2 byte / char may be valid for the Cyrillic alphabet but not in general.

import java.nio.charset.StandardCharsets;

public class Hex2String {
    public static String hex2String(String hex) {
        byte[] b=new byte[hex.length()/2];
        for (int i=0;i<b.length;i++) {
            b[i]=(byte) Integer.parseInt(hex, i*2, i*2+2, 16);
        }
        return new String(b, StandardCharsets.UTF_8);
    }
    
    public static void main(String[] args) {
        System.out.println(hex2String("D09FD0B5D180D0BDD0B8D0BA"));
    }
}
Rocco
  • 1,098
  • 5
  • 11
  • @user15793316 not in UTF-8, java default encoding is NOT UTF-8, while the hex string provided by the OP uses this encoding. 1055 is the general Unicode value. – Rocco May 28 '21 at 14:58
  • 2
    Alternatively: `byte[] bytes = new BigInteger(hex, 16).toByteArray(); return new String(bytes, bytes[0]==0? 1: 0, hex.length()/2, StandardCharsets.UTF_8);` – Holger May 28 '21 at 15:16
  • @Holger that's definitely nicer! – Rocco May 28 '21 at 15:33