Java comparator for String in EBCDIC encoding

Question

I have come across a requirement where I need to convert a string to EBCDIC encoding and then sort it. We need to sort it with EBCDIC because the string has to go in mainframe. The string I will sort will have only alphabets in captial and integers only.

I googled it some and then I came across the link from IBM which has listed the characters in order

What I realized was that EBCDIC sorting is exactly opposite to normal java lexicographic sorting (at least for the type of data which I am going to process).

My question is my realization right ? If not what I am missing ? OR is there any java comparator available for EBCDIC encoding.

score 5 · Answer 1 · answered Jul 02 '14 at 07:26

You should not spend much time figuring out the many peculiarities of EBCDIC. Given a limited scope of your problem, a simple approach to implementing your requirements is as follows:

Implement a helper method that reads EBCDIC and produces java.lang.String in Java's native encoding (UTF-16)
Implement a helper method that takes java.lang.String in Java's native encoding (UTF-16) and produces an EBCDIC-encoded string
Use the first method to read the data. Sort and do other processing as needed. Use the second method to write the data to mainframe.

This approach has an advantage that only two pieces of your code need to understand EBCDIC - the one that converts in, and the one that converts out. All other code can use Java system libraries and any libraries that you have for sorting, filtering, searching, and all other processing, without thinking about the EBCDIC encoding at all.

Thanks for the answer . Can you please help in point 1 you wrote. For eg: I receive the string from UI as "11AA" . Now can you please tell me how to proceed — Akshay, Jul 02 '14 at 08:03
@Akshay Make a [`ByteBuffer`](http://docs.oracle.com/javase/7/docs/api/java/nio/ByteBuffer.html) and populate it with EBSDIC bytes. Make [`Charset`](http://docs.oracle.com/javase/7/docs/api/java/nio/charset/Charset.html) for EBSDIC using `Charset.forName("IBM1047")`. Use `encode` and `decode` on your `Charset` object to convert to and from EBSDIC. — Sergey Kalinichenko, Jul 02 '14 at 08:12

score 5 · Accepted Answer · answered Jul 02 '14 at 07:39

5

Since the char type is implicitly UTF-16 in Java EBCDIC strings need to be compared as Java byte arrays.

Example:

    Charset encoding = Charset.forName("IBM1047");
    Comparator<String> encComparator = (s1, s2) ->
            encoding.encode(s1)
                    .compareTo(encoding.encode(s2));

answered Jul 02 '14 at 07:39

McDowell

107,573
31
204
267

According to [IBM's EBCDIC docs](http://www.ibm.com/support/knowledgecenter/en/SSGH4D_14.1.0/com.ibm.xlf141.aix.doc/language_ref/asciit.html), punctuation marks should be lesser than alphabets. But the above method results in the opposite.What I'm noticing while implementing it the way you recommended is that, the first 128 characters in EBCDIC are being given higher value, while the next 128 characters get lower value. Could this be because of the encoded bytes being signed in Java? If so, how can I fix this? If not, what else am I missing? – DetourToNirvana Feb 07 '17 at 16:34
Yes it is because of bytes being signed in Java. bytes 128 -> 255 are treated as < 0 and therefore < bytes 0 ..127 – Bruce Martin Sep 28 '17 at 23:32

score 1 · Answer 3 · answered Sep 27 '17 at 12:39

Yes there is a comparator for EBCDIC encoding.Here is the code for it.

`Comparator<Entity Class name> EBCDIC = new Comparator<Entity Class name>() 

     {  
        Charset encoding = Charset.forName("cp500");

   @Override         
  public int compare(Entity Class name jc1, 
       Entity Class name jc2) {             
          return (int) (encoding.encode(jc1.toString()).compareTo(encoding.encode(jc2.toString())));         
        }     
      };

score 0 · Answer 4 · answered Oct 11 '22 at 16:37

It has been suggested to use Charset.endode(String) and compare the resulting ByteBuffer objects. This will only work for letters and numbers, not punctuation. That's because byte has a range of -128 to 127. Characters higher than 127 will be negative so they don't compare properly to the positive bytes.

To handle the entire character set simply convert the EBCDIC bytes back to Strings in the default Charset before comparing them:

Charset ebcdicCharset = Charset.forName("IBM037");
Comparator<String> ebcdicComparator = Comparator.comparing(
        value -> new String(value.getBytes(ebcdicCharset)));

Java comparator for String in EBCDIC encoding

4 Answers4

Linked

Related