Issue processing a mainframe file ... encoding not working

Question

http://www.2shared.com/document/VqlJ-1wF/test.html

What is the encoding w/ which this file is encoded ?
What's the best way to read this in Java ?

Currently I have

Scanner scanner = new Scanner(new File("test.txt"), "IBM850");

while (scanner.hasNextLine()) {
  buffer = new StringBuffer(scanner.nextLine());
  System.out.println("BUFFER = "+buffer.toString());
}

Prints a lot of nulls and garbage. Whats the right encoding I need to use?

Are you sure this (or part of it) isn't a bitmap? Looking at the hex codes, it seems like some of the middle is more like an image than text. — Jeremiah Willcock, Mar 08 '11 at 17:54
It has a header in ASCII, meaning it likely isn't EBCDIC; "CODE123" is a type of bar code -- are you sure it isn't that? — Jeremiah Willcock, Mar 08 '11 at 18:09
I think you'd need to show us what the input file contains. Do an od -xc on the file and paste the results. Unless it is just EBDIC — Archimedes Trajano, Mar 08 '11 at 18:12
There seem to be several objects concatenated together in there -- there are several "CODE123" blocks that seem to have similar content formats. — Jeremiah Willcock, Mar 08 '11 at 18:19

Lawrence Dol · Answer 1 · 2011-03-08T18:36:21.170

I have extensive experience with moving data between PCs and IBM midrange systems. I can tell that the file is definitely not (pure) EBCDIC. At the beginning of each "line" are the ASCII characters:

CODE12312345678901502G830918

The likelihood of any EBCDIC characters matching that sequence, never mind the same sequence on all three lines is infinitesimally small.

My best bet would be ASCII lead in (or already translated EBCDIC) with binary data. If it's been translated, the binary part is almost certainly corrupted.

I may have more info shortly after I examine it in hex.

Each "record" is separated with hex 0D 0A 0D 0A, which are a pair of CRLF sequences.

I think you most likely have a fixed field flat file format with the text fields in ASCII, and other field in binary.

thanks everyone. you all made my understanding of ebcdic improve quiet a bit! — mat, Mar 08 '11 at 21:19

score 1 · Answer 2 · answered Mar 08 '11 at 18:20

1

Typically IBM mainframe data is stored in one of the regional flavors of character encodings like Cp437 in the US or the multilingual Cp870.

answered Mar 08 '11 at 18:20

GregA100k

1,385
1
11
16

How can I this by delimiter (STX,ETX) etc – mat Mar 08 '11 at 18:27

score 1 · Answer 3 · answered Mar 08 '11 at 18:27

It's definitely NOT EBCDIC-encoded (I spent the '70s and '80s working on IBM mainframes, so I recognize EBCDIC :-). It appears to be ASCII with some binary components. The only way to properly interpret this is for the provider to give you a mapping that describes each record type (there may be one or more than one) and indicates the data types of the embedded binary objects.

Bruce Martin · Answer 4 · 2011-03-12T12:34:17.510

By the looks of it you have taken a binary mainframe file and done a ascii conversion on it when transferring it to the PC. This will not work.

To illustrate what goes wrong consider a 2 byte binary integer field with a value of 64 (X’0040’) this will be converted to 32 (x’0020’) because x’40’ is also EBCIDIC for the space character; the ascii converter will convert all EBCIDIC spaces to ascii spaces (x’20’). You really want binary and Packed-Decimal fields left alone.

You have 2 options:

Convert all the Comp3 / binary fields to text on the mainframe (Cobol / sort / easytrieve etc can do this). Then do the transfer
Do a binary transfer to the PC and either write a program to read the file. The java package JRecord (http://jrecord.sourceforge.net/) can read and write Mainframe files
Do a binary transfer and use a Utility like the RecordEditor (http://record-editor.sourceforge.net/Record04.htm) to read it. The recordEditor can read mainframe file and save them as CSV or Fixed width ascii files. The RecordEditor can use a Cobol Copybook to view the file.

What I can tell you is the file is 2000 bytes long on the mainframe and contains a lot of Packed-Decimal fields (Cobol Comp-3).

I have decoded the first 120 bytes of the first record:

Field     start     length   Value                    Hex Representation
n0        1         4        CODE                     434f4445        
n1        5         17       12312345678901502        3132333132333435363738393031353032       
n2        22        1        G                        47        
n3        23        6        830918                   383330393138        
n4        29        1        V                        56        
n5        30        3        2470                     02470f        
n6        33        4        0                        0000000f        
n7        37        3        2470                     02470f        
n8        40        2        09                       3039        
n9        42        5        290502                   000290502c        
n10       47        5        10842                    000010842c        
n11       52        5        279660                   000279660c        
n12       57        5        19072                    000019072c        
n13       62        5        11488                    000011488c        
n14       67        5        0                        000000000c        
n15       72        4        0                        0000000c        
n16       76        4        0                        0000000c        
n17       80        7        439914                   0000000439914c        
n18       87        7        0                        0000000000000c        
n19       94        7        0                        0000000000000c        
n20       101       4        7588                     0007588c        
n21       105       4        7588                     0007588c        
n22       109       4        0                        0000000c        
n23       113       4        0                        0000000c        
n24       117       5        0                        000000000c        

Where: 
Start  - Field start (byte number)
length - Field length (in bytes)
Value  - Field value
Hex representation - How the field is stored in the file in hex

score 0 · Answer 5 · answered Aug 28 '18 at 18:26

0

use cp1047 charset like below.

BufferedReader br = new BufferedReader(new InputStreamReader(InputStream, "cp1047" ));

answered Aug 28 '18 at 18:26

Smart Coder

1,435
19
19

Wont work in this case, the file contained binary data corrupted by EBCDIC to ascii conversion – Bruce Martin Aug 29 '18 at 06:28

Issue processing a mainframe file ... encoding not working

5 Answers5