1

Some tool is sending me Japanese content as byte array.

So using java I have to read that byte array and display the Japanese content.

I am not getting any ideas for achieving this.

Till now I tried with below mentioned program just to check how this conversion works:

String s= "業界支出TXT_20150130170955";
    byte b1[];
    try {
        b1 = s.getBytes();
        for (int j=0;j<b1.length; j++){
            System.out.println(b1[j]+"-----------"+(char)b1[1]);
        }
    } catch (UnsupportedEncodingException e2) {
        // TODO Auto-generated catch block
        e2.printStackTrace();
    } 

Now this gives me some junk data. I know I am doing this entirely wrong but I am not getting any idea to read a byte stream to Japanese characters.

Any help would be appreciated.

Edit :1

WE NEED TO GET THE JAPANESE CHARS FROM "decoded" BYTE ARRAY I tried following things :

 byte[] decoded = Base64.decodeBase64("qzD8MMkwGk/hVClSKHWCaYGJCP/GMK0wuTDIMAn/DQAKAA0ACgApUih1xzD8ML8w1lOXX+VlfgCgUt92l15qdfdTfgCgUt92l15+AClSKHVzijB9fgAakKiMfgB+AKsw/DDJMBpP4VQNVE1Sfg==");
        try {
            System.out.println(new String(decoded, "UTF-8") + "\n");
System.out.println(new String(decoded, "SHIFTJIS") + "\n"); 
        } catch (UnsupportedEncodingException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } 

but we are not getting the expected results pls advide

Onki
  • 1,879
  • 6
  • 38
  • 58

1 Answers1

2

To convert a byte array to a String, you should use the String(byte[] bytes, Charset charset) constructor.

To properly decode the bytes into a sequence of characters, you have to know the character encoding in which to interpret the bytes. The most common is UTF-8.

Example:

// Bytes of UTF-8 encoded Japanese word: "そこ" (there)
byte[] data = new byte[]{-29, -127, -99, -29, -127, -109};

String s = new String(data, StandardCharsets.UTF_8);
System.out.println(s);

Output:

そこ

Note that the reverse order (String => byte[]) can be achieved with the
byte[] String.getBytes(Charset charset) method:

String s = "そこ";
byte[] data = s.getBytes(StandardCharsets.UTF_8);
System.out.println(Arrays.toString(data));

Which prints:

[-29, -127, -99, -29, -127, -109]

Final note

Avoid using the String constructor which only takes a byte array and no charset, and the String.getBytes() method which has no parameters because converting a String to byte[] or the other way, an encoding is required; and even if you don't specify an encoding, one will still be used: the platforms's default encoding which can vary from platform to platform or even from run-to-run hence your code would become unportable (could work differently on differnet machines).

For Java prior to 7.0

If you use a Java prior to 7.0, you can use the constructor and getBytes() method which takes the charset as a String and not as a Charset. You have to provide the name of the charset:

String(byte[] bytes, String charsetName)

byte[] getBytes(String charsetName)

Example:

// From String to byte array:
byte[] data = s.getBytes("UTF-8");

// From byte array to String:
String s = new String(data, "UTF-8");
icza
  • 389,944
  • 63
  • 907
  • 827
  • your solution resolved my half of the problem. But still I am stuck with another issue i.e I am getting data as byte array. So how can I convert my byte array to this byte format which you are getting by using this :-getBytes(StandardCharsets.UTF_8); – Onki Jan 30 '15 at 14:27
  • @user3610891 The `getBytes()` method is used to convert an already created `String` object to bytes. If you have a byte array, you can create a `String` object form it using the constructor `String(byte[] bytes, Charset charset)`. – icza Jan 30 '15 at 14:31
  • Hi @icza, Thanks for the help! StandardCharsets is supported by only java 1.7 only so could you please help us to get same implementations in java 1.6v. and again thanks for your help. – Onki Feb 02 '15 at 09:31
  • @user3610891 If you use Java 1.6, you can use the constructor and `getBytes()` method which takes the charset as a `String` and not as a `Charset`. You have to provide the charset _name_ e.g. `"UTF-8"`. – icza Feb 02 '15 at 09:33
  • @user3610891 Edited my answer to add links and usage example. – icza Feb 02 '15 at 09:37