why does "STRING".getBytes() work different according to the Operation System

Question

I am running the code below and I am getting different outcome from "some_string".getBytes() depending if I am in Windows or Unix. The issue happens with any string (I tried a very simple ABC and same problem.

See the differences below printed in console.

The code below is well-tested using Java 7. If you copy it entirely it will run.

Additionally, see the difference in Hexadecimal in the two images below. The first two images shows the file created in Windows. You can see the hexadecimal values with ANSI and EBCDIC respectively. The third image, the black one, is from Unix. You can see the hexadecimal (-c option) and the character readable in which I believe it is EBCDIC.

So, my straight question is: why does such code work different since I am just using Java 7 in both case? Should I check any especific property in somewhere? Maybe, Java in Windows get certain default format and in Unix it get another. If so, which property must I check or settup?

Unix Console:

$ ./java -cp /usr/test.jar test.mainframe.read.test.TestGetBytes
H = 76 - L
< wasn't found

Windows Console:

H = 60 - <
H1 = 69 - E
H2 = 79 - O
H3 = 77 - M
H4 = 62 - >
End of Message found

The entire code:

package test.mainframe.read.test;

import java.util.ArrayList;

public class TestGetBytes {

       public static void main(String[] args) {
              try {
                     ArrayList ipmMessage = new ArrayList();
                     ipmMessage.add(newLine());

                     //Windows Path
                     writeMessage("C:/temp/test_bytes.ipm", ipmMessage);
                     reformatFile("C:/temp/test_bytes.ipm");
                     //Unix Path
                     //writeMessage("/usr/temp/test_bytes.ipm", ipmMessage);
                     //reformatFile("/usr/temp/test_bytes.ipm");
              } catch (Exception e) {

                     System.out.println(e.getMessage());
              }
       }

       public static byte[] newLine() {
              return "<EOM>".getBytes();
       }

       public static void writeMessage(String fileName, ArrayList ipmMessage)
                     throws java.io.FileNotFoundException, java.io.IOException {

              java.io.DataOutputStream dos = new java.io.DataOutputStream(
                           new java.io.FileOutputStream(fileName, true));
              for (int i = 0; i < ipmMessage.size(); i++) {
                     try {
                           int[] intValues = (int[]) ipmMessage.get(i);
                           for (int j = 0; j < intValues.length; j++) {
                                  dos.write(intValues[j]);
                           }
                     } catch (ClassCastException e) {
                           byte[] byteValues = (byte[]) ipmMessage.get(i);
                           dos.write(byteValues);
                     }
              }
              dos.flush();
              dos.close();

       }

       // reformat to U1014
       public static void reformatFile(String filename)
                     throws java.io.FileNotFoundException, java.io.IOException {
              java.io.FileInputStream fis = new java.io.FileInputStream(filename);
              java.io.DataInputStream br = new java.io.DataInputStream(fis);

              int h = br.read();
              System.out.println("H = " + h + " - " + (char)h);

              if ((char) h == '<') {// Check for <EOM>

                     int h1 = br.read();
                     System.out.println("H1 = " + h1 + " - " + (char)h1);
                     int h2 = br.read();
                     System.out.println("H2 = " + h2 + " - " + (char)h2);
                     int h3 = br.read();
                     System.out.println("H3 = " + h3 + " - " + (char)h3);
                     int h4 = br.read();
                     System.out.println("H4 = " + h4 + " - " + (char)h4);
                     if ((char) h1 == 'E' && (char) h2 == 'O' && (char) h3 == 'M'
                                  && (char) h4 == '>') {
                           System.out.println("End of Message found");
                     }
                     else{
                           System.out.println("EOM not found but < was found");
                     }
              }
              else{
                     System.out.println("< wasn't found");
              }
       }
}

score 3 · Accepted Answer · answered Apr 20 '16 at 01:12

3

You are not specifying a charset when calling getBytes(), so it uses the default charset of the underlying platform (or of Java itself if specified when Java is started). This is stated in the String documentation:

public byte[] getBytes()

Encodes this String into a sequence of bytes using the platform's default charset, storing the result into a new byte array.

getBytes() has an overloaded version that lets you specify a charset in your code.

public byte[] getBytes(Charset charset)

Encodes this String into a sequence of bytes using the given charset, storing the result into a new byte array.

answered Apr 20 '16 at 01:12

Remy Lebeau

555,201
31
458
770

2

On Windows the default character-set is quite likely cp1252 (or country specific character-set). Many Linux use UTF08. On the mainframe I presume it would EBCDIC – Bruce Martin Apr 20 '16 at 02:38
1

*likely* cp1252 is usually only in Western countries. There are many Ansi charsets used around the world, and many Windows codepages for them. Don't assume cp1252, be explicit based on your actual needs. – Remy Lebeau Apr 20 '16 at 02:44
Thanks. It is my first time working with mainframe and honestly I have never need to setup charact-set in Windows world. I fixed the issue after suggestion and this is really the answer for my question. BTW, if you can suggest me some article or question in this forum to read what character-set really means I will be thankfull. In the mainframe I work, the character-set is IBM1047 (I got this while printing java.nio.charset.Charset.defaultcharset). The issue was fixed by passing org.apache.commons.io.Charsets.UTF_8 in getBytes(). I found strange that I didn't find same constant in java.nio. – Jim C Apr 20 '16 at 20:09
1

@JimC: Java has a [`java.nio.charset.StandardCharsets`](https://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html) class, you can use `StandardCharsets.UTF_8`. The [`org.apache.commons.io.Charsets` documentation](https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/Charsets.html#UTF_8) even states: "*Deprecated. Use Java 7's StandardCharsets*". – Remy Lebeau Apr 20 '16 at 21:19
1

@JimC: As for an article on charsets, see [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](http://www.joelonsoftware.com/articles/Unicode.html). – Remy Lebeau Apr 20 '16 at 21:21

why does "STRING".getBytes() work different according to the Operation System

1 Answers1

Linked

Related